期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

厉海燕李新明《计算机技术与发展》2001,11(4)

连续媒质的应用程序要求有实时线程高效灵活的支持,包括对线程属性的动态管理和多线程模式的支持。本文介绍在RT-Mach微内核上用户实时线程的设计和实现。相似文献

2.

J. Barrie Bresnahan David T. Barnard Ian A. Macleod 《Software》1984,14(12):1197-1205

WSH is a window managing command language interpreter (shell) for the UNIX operating system. This interface increases a user's power to communicate with UNIX by providing a process management environment based on an integrated virtual terminal-shell design. Unlike conventional shells, WSH affords its user the ability to monitor and communicate with multiple processes within a single display context. Windows represent virtual terminals in the WSH design, and as such are device-independent abstractions of real terminals. Since the design of the virtual terminal environment is based on UNIX's TERMCAP terminal database facility, WSH is portable across all versions of UNM supporting this feature. WSH requires neither alteration of existing UNIX facilities nor special display devices. The current implementation runs on a Digital VAX 11/780 under Berkeley UNIX 4.1 environment. 相似文献

3.

Mach3.0核心的分析 总被引：6，自引：0，他引：6

陈华瑛《计算机研究与发展》1994,31(9):1-6

本文基于对Ｍａｃｈ３．０核心的分析，深入剖析了Ｍａｃｈ设计和实现的特点，并扼要介绍了Ｍａｃｈ３．０核心的组成、对外接口和核心线程等。相似文献

4.

Thread Scheduling for Multiprogrammed Multiprocessors

N. S. Arora R. D. Blumofe C. G. Plaxton 《Theory of Computing Systems》2001,34(2):115-144

We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its performance under multiprogramming. We model multiprogramming with two scheduling levels: our scheduler runs at user-level and schedules threads onto a fixed collection of processes, while below this level, the operating system kernel schedules processes onto a fixed collection of processors. We consider the kernel to be an adversary, and our goal is to schedule threads onto processes such that we make efficient use of whatever processor resources are provided by the kernel. Our thread scheduler is a non-blocking implementation of the work-stealing algorithm. For any multithreaded computation with work T ₁ and critical-path length T _∈ fty , and for any number P of processes, our scheduler executes the computation in expected time O(T ₁ /P _A + T _∈ fty P/P _A ) , where P _A is the average number of processors allocated to the computation by the kernel. This time bound is optimal to within a constant factor, and achieves linear speedup whenever P is small relative to the parallelism T ₁ /T _∈ fty . Online publication February 26, 2001. 相似文献

5.

Helper threads via virtual multithreading

《Micro, IEEE》2004,24(6):74-82

Memory latency dominates the performance of many applications on modern processors, despite advances in caches and prefetching techniques. Numerous prefetching techniques, both in hardware and software, try to alleviate the memory bottleneck. One such technique, known as helper threading improves single-thread performance on a simultaneous multithreaded architecture (SMT), which shares processor resources, including caches, among logical threads. It uses otherwise idle hardware thread contexts to execute speculative threads on behalf of the main thread. Helper threading accelerates a program by exploiting a processor's multithreading capability to run assist threads. Based on the helper threading usage model, virtual multithreading (VMT), a form of switch-on-event user-level multithreading, can improve performance for real-world workloads with a wall-clock speedup of 5.0 to 38.5 percent 相似文献

6.

The Elmwood multiprocessor operating system

Thomas J. Leblanc John M. Mellor-Crummey Neal M. Gafter Lawrence A. Crowl Peter C. Dibble 《Software》1989,19(11):1029-1055

Elmwood is an object-oriented, multiprocessor operating system designed and implemented during a graduate seminar. It consists of a minimal kernel and a collection of user-implemented services. The kernel provides two major abstractions: objects, which consist of code and data, and processes, which represent asynchronous activity. Objects, like programs, are passive. To operate on an abstraction or to request a service, processes invoke an entry procedure defined by the corresponding object. Objects implement their own protection and synchronization policies using minimal kernel mechanisms. We describe the Elmwood kernel interface, an implementation on the BBN Butterfly parallel processor, and our experiences in developing a multiprocessor operating system under rigid time constraints. These experiences illustrate several general lessons regarding kernel design and trade-offs for implementation expedience. 相似文献

7.

Communication Express通讯软件接口：实现技术与性能评测 总被引：1，自引：0，他引：1

下载免费PDF全文

谢旻刘路卢宇彤傅清朝周恩强《计算机工程与科学》2007,29(11):140-144

本文描述了一个基于PCI-X总线高速通讯卡的通讯软件接口实现技术，该接口通过虚拟硬件资源，实现了保护的用户级通讯操作，提供报文传输和RDMA两种数据传输方式，实现进程间数据的零拷贝传输，同时基于该接口还支持IP报文的传输。测试中该接口在单链路上实现501MB／s，双链路上实现1002MB／s的带宽，在基于Socket接口的测试中，实现了384MB／s的通讯带宽。相似文献

8.

线程池技术在考试系统中的应用

葛萌于博欧阳宏基《计算机系统应用》2016,25(4):107-111

当较大规模客户端并发请求服务器端应用程序时,传统的为每个请求创建线程的解决方法会导致服务器端性能的严重下降甚至死机.通过分析JDK的Executor框架,从工作原理、核心线程池对象、执行策略等方面详细描述了线程池模型,应用到一个三层C/S架构的在线考试系统中,给出了服务端的设计架构和实现代码.通过仿真测试证明了线程池技术在解决较大并发访问方面的稳定性. 相似文献

9.

The peregrine high-performance RPC system

David B. Johnson Willy Zwaenepoel 《Software》1993,23(2):201-221

The Peregrine RPC system provides performance very close to the optimum allowed by the hardware limits, while still supporting the complete RPC model. Implemented on an Ethernet network of Sun-3/60 workstations, a null RPC between two user-level threads executing on separate machines requires 573μs. This time compares well with the fastest network RPC times reported in the literature, ranging from about 1100 to 2600 μs, and is only 309 μs above the measured hardware latency for transmitting the call and result packets in our environment. For large multi-packet RPC calls, the Peregrine user-level data transfer rate reaches 8.9 Mbit/s, approaching the Ethernet's 10 Mbit/s network transmission rate. Between two user-level threads on the same machine, a null RPC requires 149 μs. This paper identifies some of the key performance optimizations used in Peregrine, and quantitatively assesses their benefits. 相似文献

10.

Disintermediated Active Communication

《Computer Architecture Letters》2006,5(2):15-15

Disintermediated active communication (DAC) is a new paradigm of communication in which a sending thread actively engages a receiving thread when sending it a message via shared memory. DAC is different than existing approaches that use passive communication through shared-memory - based on intermittently checking for messages - or that use preemptive communication but must rely on intermediaries such as the operating system or dedicated interrupt channels. An implementation of DAC builds on existing cache coherency support and exploits light-weight user-level interrupts. Inter-thread communication occurs via monitored memory locations where the receiver thread responds to invalidations of monitored addresses with a light-weight user-level software-defined handler. Address monitoring is supported by cache line user-bits, or CLUbits. CLUbits reside in the cache next to the coherence state, are private per thread, and maintain user-defined per-cache-line state. A light weight software library can demultiplex asynchronous notifications and handle exceptional cases. In DAC-based programs threads coordinate with one another by explicit signaling and implicit resource monitoring. With the simple and direct communication primitives of DAC, multi-threaded workloads synchronize at a finer granularity and more efficiently utilize the hardware of upcoming multi-core designs. This paper introduces DAC, presents several signaling models for DAC-based programs, and describes a simple memory-based framework that supports DAC by leveraging existing cache-coherency models. Our framework is general enough to support uses beyond DAC 相似文献

11.

Multiparadigm distributed computing with TPVM

ADAM FERRARI V. S. SUNDERAM 《Concurrency and Computation》1998,10(3):199-228

Distributed concurrent computing based on lightweight processes can potentially address performance and functionality limits in heterogeneous systems. The TPVM framework, based on the notion of ‘exportable services’, is an extension to the PVM message-passing system, but uses threads as units of computing, scheduling, and parallelism. TPVM facilitates and supports three different distributed concurrent programming paradigms: (a) the traditional, task based, explicit message-passing model; (b) a data-driven instantiation model that enables straightforward specification of computation based on data dependencies; and (c) a partial shared-address space model via remote memory access, with naming and typing of distributed data areas. The latter models offer significantly different computing paradigms for network-based computing, while maintaining a close resemblance to, and building upon, the conventional PVM infrastructure in the interest of compatibility and ease of transition. The TPVM system comprises three basic modules: a library interface that provides access to thread-based distributed concurrent computing facilities, a portable thread interface module which abstracts the required thread-related services, and a thread server module which performs scheduling and system data management. System implementation as well as applications experiences have been very encouraging, indicating the viability of the proposed models, the feasibility of portable and efficient threads systems for distributed computing, and the performance improvements that result from multithreaded concurrent computing. © 1998 John Wiley & Sons, Ltd. 相似文献

12.

Java 虚拟机用户级多线程的设计与实现 总被引：5，自引：0，他引：5

丁宇新程虎《软件学报》2000,11(5):701-706

详细介绍了国产开放系统平台Java虚拟机多线程的设计与实现.在线程调度上,采用带有独立队列的静态级别轮巡调度,较好地解决了独立循环线程的调度问题.对于线程的同步,采用了哈希混合锁的设计方案.实验结果证明,该锁具有空间小、执行效率高等特点. 相似文献

13.

Arachne: a portable threads system supporting migrant threads onheterogeneous network farms

Dimitrov B. Rego V. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(5):459-469

We present the design and implementation of Arachne, a threads system that can be interfaced with a communications library for multithreaded distributed computations. In particular, Arachne supports thread migration between heterogeneous platforms, dynamic stack size management, and recursive thread functions. Arachne is efficient, flexible, and portable-it is based entirely on C and C++. To facilitate heterogeneous thread operations, we have added three keywords to the C++ language. The Arachne preprocessor takes as input code written in that language and outputs C++ code suitable for compilation with a conventional C++ compiler. The Arachne runtime system manages all threads during program execution. We present some performance measurements on the costs of basic thread operations and thread migration in Arachne and compare these to costs in other threads systems 相似文献

14.

Thread-Sensitive Instruction Issue for SMT Processors

《Computer Architecture Letters》2004,3(1):5-5

Simultaneous Multi Threading (SMT) is a processor design method in which concurrent hardware threads share processor resources like functional units and memory. The scheduling complexity and performance of an SMT processor depend on the topology used in the fetch and issue stages. In this paper, we propose a thread sensitive issue policy for a partitioned SMT processor which is based on a thread metric. We propose the number of ready-to-issue instructions of each thread as priority metric. To evaluate our method, we have developed a reconfigurable SMT-simulator on top of the SimpleScalar Toolset. We simulated our modeled processor under several workloads composed of SPEC benchmarks. Experimental results show around 30% improvement compared to the conventional OLDEST_FIRST mixed topology issue policy. Additionally, the hardware implementation of our architecture with this metric in issue stage is quite simple. 相似文献

15.

Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors

Stuart Fiske William J. Dally 《Future Generation Computer Systems》1995,11(6):503-518

Multiple-context processors provide register resources that allow rapid context switching between several threads as a means of tolerating long communication and synchronization latencies. When scheduling threads on such a processor, we must first decide which threads should have their state loaded into the multiple contexts, and second, which loaded thread is to execute instructions at any given time. In this paper we show that both decisions are important, and that incorrect choices can lead to serious performance degradation. We propose thread prioritization as a means of guiding both levels of scheduling. Each thread has a priority that can change dynamically, and that the scheduler uses to allocate as many computation resources as possible to critical threads. We briefly describe its implementation, and we show simulation performance results for a number of simple benchmarks in which synchronization performance is critical. 相似文献

16.

Certification of Thread Context Switching

下载免费PDF全文

Yu Guo Xin-Yu Jiang Yi-Yun Chen 《计算机科学技术学报》2010,25(4):827-840

With recent efforts to build foundational certified software systems, two different approaches have been proposed to certify thread context switching. One is to certify both threads and context switching in a single logic system, and the other certifies threads and context switching at different abstraction levels. The former requires heavyweight extensions in the logic system to support first-class code pointers and recursive specifications. Moreover, the specification for context switching is very complex. The latter supports simpler and more natural specifications, but it requires the contexts of threads to be abstracted away completely when threads are certified. As a result, the conventional implementation of context switching used in most systems needs to be revised to make the abstraction work. In this paper, we extend the second approach to certify the conventional implementation, where the clear abstraction for threads is unavailable since both threads and context switching hold pointers of thread contexts. To solve this problem, we allow the program specifications for threads to refer to pointers of thread contexts. Thread contexts are treated as opaque structures, whose contents are unspecified and should never be accessed by the code of threads. Therefore, the advantage of avoiding the direct support of first-class code pointers is still preserved in our method. Besides, our new approach is also more lightweight. Instead of using two different logics to certify threads and context switching, we employ only one program logic with two different specifications for the context switching. One is used to certify the implementation itself, and the more abstract one is used as an interface between threads and context switching at a higher abstraction level. The consistency between the two specifications are enforced by the global program invariant. 相似文献

17.

Indigo: user-level support for building distributed shared abstractions

Prince Kohli Mustaque Ahamad Karsten Schwan 《Concurrency and Computation》1998,10(1):1-29

Distributed systems that consist of workstations connected by high performance interconnects offer computational power comparable to moderate size parallel machines. Middleware like distributed shared memory (DSM) or distributed shared objects (DSO) attempts to improve the programmability of such hardware by presenting to application programmers interfaces similar to those offered by shared memory machines. This paper presents the portable Indigo data sharing library which provides a small set of primitives with which arbitrary shared abstractions are easily and efficiently implemented across distributed hardware platforms. Sample shared abstractions implemented with Indigo include DSM as well as fragmented objects, where the object state is split across different machines and where interfragment communications may be customized to application-specific consistency needs. The Indigo library's design and implementation are evaluated on two different target platforms: a workstation cluster and an IBM SP2 machine. As part of this evaluation, a novel DSM system and consistency protocol are implemented and evaluated with several high performance applications. Application performance attained with the DSM system is compared to the performance experienced when utilizing the underlying basic message-passing facilities or when employing Indigo to construct customized fragmented objects implementing the application's shared state. Such experimentation results in insights concerning the efficient implementation of DSM systems (e.g. how to deal with false sharing). It also leads to the conclusion that Indigo provides a sufficiently rich set of abstractions for efficient implementation of the next generation of parallel programming models for high performance machines. © 1998 John Wiley & Sons, Ltd. 相似文献

18.

一种支持线索迁移的分布式存储结构

黄仲伟罗昕《小型微型计算机系统》1995,16(1):32-36

本文介绍一种基于ＵＮＩＸ的分布式存储结构，用于支持分布多系统中的机间线索迁移，同一地址空间中的多个线索实现为共享地址空间的多个轻权进程，这些进程核心视为普通的ＵＮＩＸ进程，本文主要讨论分式存储和线索迁移的实现机制。相似文献

19.

Memory-aware kernel mechanism and policies for improving internode load balancing on NUMA systems

Mei-Ling Chiang Wei-Lun Su Shu-Wei Tu Zhen-Wei Lin 《Software》2019,49(10):1485-1508

Although nonuniform memory access architecture provides better scalability for multicore systems, cores accessing memory on remote nodes take longer than those accessing on local nodes. Remote memory access accompanied by contention for internode interconnection degrades performance. Properly mapping threads to cores and data accessed to their nodes can substantially improve performance and energy efficiency. However, an operating system kernel's load-balancing activity may migrate threads across nodes, which thus messes up the thread mapping. Besides, subsequent data mapping behavior pays for the cost of page migration to reduce remote memory access. Once unsuitable threads are migrated, it is detrimental to system performance. This paper focuses on improving the kernel's internode load balancing on nonuniform memory access systems. We develop a memory-aware kernel mechanism and policies to reduce remote memory access incurred by internode thread migration. The Linux kernel's load balancing mechanism is modified to incorporate selection policies in the internode thread migration, and the kernel is modified to track the amount of memory used by each thread on each node. With this information, well-designed policies can then choose suitable threads for internode migration. The purpose is to avoid migrating a thread that might incur relatively more remote memory access and page migration. The experimental results show that with our mechanism and the proposed selection policies, the system performance is substantially increased when compared with the unmodified Linux kernel that does not consider memory usage and always migrates the first-fit thread in the runqueue that can be migrated to the target central processing unit. 相似文献

20.

Thread- and Process-based Implementations of the pSystem Parallel Programming Environment

LUÍS M. B. LOPES FERNANDO M. A. SILVA 《Software》1997,27(3):329-351

相似文献