期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A parallel production system architecture

E. Bahr F. Barachini J. Doppelbauer H. Grbner F. Kasparec T. Mandl H. Mistelberger 《Journal of Parallel and Distributed Computing》1991,13(4)

The growing importance of expert systems in real-time applications reveals the necessity of reducing response times. Since monoprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further performance gain. Efficient exploitation of the inherent parallelism of production systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is supported by a transputer based architecture with an advanced interconnection structure. 相似文献

2.

The cylindrical banyan multicomputer: A reconfigurable systolic architecture

Miroslaw Malek Eli Opper 《Parallel Computing》1989,10(3):319-327

A new reconfigurable systolic multicomputer architecture is presented. The proposed architecture, called the Cylindrical Banyan Multicomputer (CBM), is based on the structure of a modified banyan network where every node of the network graph is composed of an application processor, a local memory and a communication processor, and network's inputs and outputs are merged (fused). The CBM has one of the lowest (cost) X (delay) among known multicomputer architectures based on regular networks. It is shown that a variety of computation structures such as pipelines, rings, and trees may be constructed and reconfigured in an optimal or a nearby optimal way on the CBM architecture, and that various basic algorithms can be executed very efficiently in a systolic manner. It is also shown that the CBM is an easily diagnosable and fault-tolerant system. 相似文献

3.

An efficient parallel prefix sums architecture with domino logic

Lin R. Nakano K. Olariu S. Zomaya A.Y. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(9):922-931

The main contribution of this work is to propose an efficient parallel prefix sums architecture based on the recently-developed technique of shift switching with domino logic, where the charge/discharge signals propagate along the switch chain producing semaphores in a network that is fast and highly hardware-compact. The proposed architecture for computing the prefix sums of N-1 bits features a total delay of (4 log N + /spl radic/N-2)/sub */T/sub d/, where T/sub d/ is the delay for charging or discharging a row of two prefix sum units of eight shift switches. Our simulation results show that, under 0.8-micron CMOS technology, the delay T/sub d/ does not exceed 1 ns. As it turns out, our design is faster than any design known to us for values on N in the range 1 /spl les/ N /spl les/ 2/sup 10/. Yet, another important and novel feature of the proposed architecture is that it requires very simple controls, partially driven by the semaphores. This significantly reduces the hardware complexity of the design and fully utilizes the inherent speed of the process. 相似文献

4.

A wide instruction word architecture for parallel execution of logic programs coded in BSL

Kemal Ebcioĝlu Manoj Kumar 《New Generation Computing》1990,7(2-3):219-242

This paper begins by describing BSL, a new logic programming language fundamentally different from Prolog. BSL is a nondeterministic Algol-class language whose programs have a natural translation to first order logic; executing a BSL program without free variables amounts to proving the corresponding first order sentence. A new approach is proposed for parallel execution of logic programs coded in BSL, that relies on advanced compilation techniques for extracting fine grain parallelism from sequential code. We describe a new “Very Long Instruction Word” (VLIW) architecture for parallel execution of BSL programs. The architecture, now being designed at the IBM Thomas J. Watson Research Center, avoids the synchronization and communication delays (normally associated with parallel execution of logic programs on multiprocessors), by determining data dependences between operations at compile time, and by coupling the processing elements very tightly, via a single central shared register file. A simulator for the architecture has been implemented and some simulation results are reported in the paper, which are encouraging. 相似文献

5.

Mos: A multicomputer distributed operating system

Amnon Barak Ami Litman 《Software》1985,15(8):725-737

This paper describes the goals and the internal structure of MOS, a Multicomputer distributed Operating System. MOS is a general-purpose time-sharing operating system which makes a cluster of loosely connected independent homogeneous computers behave as a single-machine UNIX system. The main goals of the system include network transparency, decentralized control, site autonomy and dynamic process migration. The main objective in the design of the system was to reduce the complexity of the system, while maintaining good performance. The internal structure of the system can be characterized by modularity, a high degree of information hiding, hierarchical organization and remote procedure calls. 相似文献

6.

Realtime digital signal processing system using a parallel processing architecture

PC Ching SW Wu 《Microprocessors and Microsystems》1989,13(10):653-658

A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example. 相似文献

7.

A universal parallel computer architecture

William J. Dally 《New Generation Computing》1993,11(3-4):227-249

Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer. 相似文献

8.

Performance measurement and trace driven simulation of parallel CADand numeric applications on a hypercube multicomputer

Hsu J.-M. Banerjee P. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):451-464

The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. Based on the measurement results, both the computation and communication behavior of these parallel programs were investigated. The various time interval distributions were modeled by statistical functions which were verified by a nonlinear regression technique using the empirical data. The temporal and spatial localities of message destinations were also studied. A model for the temporal locality of message length was introduced and used to analyze the communication traces. A trace-drive simulation environment, which uses the communication patterns of the parallel programs as inputs, was developed to study the behavior of the communication hardware under real workload. Simulation results on DMA and link utilizations are reported 相似文献

9.

Clustering on a hypercube multicomputer

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(2):129-137

Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented 相似文献

10.

基于SOA架构的平行管理信息系统设计

崔峰程长建王飞跃刘希未李乐飞邹余敏何力健《计算机与应用化学》2010,27(9)

通过综合考虑企业生产过程中的工程复杂性和社会复杂性,平行管理为企业的综合管理和预测管理提供1种很有前景的理论方法,然而平行管理的软件技术架构体系需进一步完善.本文基于SOA技术,提出平行管理信息系统的概念、系统架构及其与现有信息系统的交互方式.结合乙烯生产系统,将平行管理信息系统分为协同层、应用层、服务层、组件层和资源层,在服务层和组件层构建和真实乙烯系统相一致的人工乙烯系统,在协同层和应用层通过流程编排和服务组合实现企业管理的计算和优化.最后以乙烯员工行为评估为例,说明了系统的构建过程.与传统的管理信息系统不同,平行管理信息系统将虚拟系统与实际信息系统互联互通,并融入了人的行为和管理制度等社会性因素统筹考虑.系统具有开放性、可灵活重构的特点,能快速响应企业业务变化和实验设计需求,为复杂生产系统的试验、培训、优化等多个管理目标提供了可行的技术实现手段. 相似文献

11.

A model for an intelligent operating system for executing image understanding tasks on a reconfigurable parallel architecture

C. Henry Chu Edward J. Delp Leah H. Jamieson Howard Jay Siegel Francis J. Weil Andrew B. Whinston 《Journal of Parallel and Distributed Computing》1989,6(3)

Parallel processing is one approach to achieving the large computational processing capabilities required by many real-time computing tasks. One of the problems that must be addressed in the use of reconfigurable multiprocessor systems is matching the architecture configuration to the algorithms to be executed. This paper presents a conceptual model that explores the potential of artificial intelligence tools, specifically expert systems, to design an Intelligent Operating System for multiprocessor systems. The target task is the implementation of image understanding systems on multiprocessor architectures. PASM is used as an example multiprocessor. The Intelligent Operating System concepts developed here could also be used to address other problems requiring real-time processing. An example image understanding task is presented to illustrate the concept of intelligent scheduling by the Intelligent Operating System. Also considered is the use of the conceptual model when developing an image understanding system in order to test different strategies for choosing algorithms, imposing execution order constraints, and integrating results from various algorithms. 相似文献

12.

A distributed load-balancing policy for a multicomputer

Amnon Barak Amnon Shiloh 《Software》1985,15(9):901-913

This paper deals with the organization of a distributed load-balancing policy for a multicomputer system which consists of a cluster of independent computers that are interconnected by a local area communication network. We introduce three algorithms necessary to maintain load balancing in this system: the local load algorithm, used by each processor to monitor its own load; the exchange algorithm, for exchanging load information between the processors, and the process migration algorithm that uses this information to dynamically migrate processes from overloaded to underloaded processors. The policy that we present is distributed, i.e. each processor uses the same policy. It is both dynamic, responding to load changes without using an a priori knowledge of the resources that each process requires; and stable, unnecessary overloading of a processor is minimized. We give the essential details of the implementation of the policy and initial results on its performance. Our results confirm the feasibility of building distributed systems that are based on network communication for uniform access, resource sharing and improved reliability, as well as the use of workstations without a secondary storage device. 相似文献

13.

A parallel execution model of logic programs

Chen A.C. Wu C.-I. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(1):79-92

A parallel-execution model that can concurrently exploit AND and OR parallelism in logic programs is presented. This model employs a combination of techniques in an approach to executing logic problems in parallel, making tradeoffs among number of processes, degree of parallelism, and combination bandwidth. For interpreting a nondeterministic logic program, this model (1) performs frame inheritance for newly created goals, (2) creates data-dependency graphs (DDGs) that represent relationships among the goals, and (3) constructs appropriate process structures based on the DDGs. (1) The use of frame inheritance serves to increase modularity. In contrast to most previous parallel models that have a large single process structure, frame inheritance facilitates the dynamic construction of multiple independent process structures, and thus permits further manipulation of each process structure. (2) The dynamic determination of data dependency serves to reduce computational complexity. In comparison to models that exploit brute-force parallelism and models that have fixed execution sequences, this model can reduce the number of unification and/or merging steps substantially. In comparison to models that exploit only AND parallelism, this model can selectively exploit demand-driven computation, according to the binding of the query and optional annotations. (3) The construction of appropriate process structures serves to reduce communication complexity. Unlike other methods that map DDGs directly onto process structures, this model can significantly reduce the number of data sent to a process and/or the number of communication channels connected to a process 相似文献

14.

Performance of symbolic applications on a parallel architecture

Adolfo Guzman Edward J. Krall Patrick F. McGehearty Nader Bagherzadeh 《International journal of parallel programming》1987,16(3):183-214

The results of a study of a family of parallel symbolic architectures executing several parallel applications are presented. The class of architectures being simulated is characterized by a shared memory structure, by a hierarchical interconnect, and by clustered processors. Speedup measurements were obtained from six different application kernels. Measurements were also performed to assess the degradation of speedup as a function of the interconnection delays, and to study the effect of different scheduling algorithms. The results presented support the claim that the proposed architecture would be a powerful parallel symbolic computation system. The paper discusses processor starvation, fine grain parallelism, unever loads, foreign reference, schedule and indeterminate computation with respect to the applications chosen.This work was completed within the Advanced Computer Architecture Program, Micro-electronics and Technology Computer Corporation, Austin, Texas. 相似文献

15.

Maze routing on a hypercube multicomputer

Youngju Won Sartaj Sahni 《The Journal of supercomputing》1988,2(1):55-79

The implementation of Lee's maze routing algorithm on an MIMD hypercube multiprocessor computer can follow several plausible mappings and synchronization strategies. These are evaluated experimentally on an NCUBE/7 hypercube computer with 64 processors. Different grid partitioning and mapping strategies result in a different balance between computation and communication time. The total routing time is significantly impacted by the synchronization and termination detection scheme used. Further, by rearranging the computation, it is possible to overlap much of the interprocessor communication with the computation and realize a significant reduction in the overall run time. By choosing the right partitioning and synchronization scheme and by overlapping computation and communication, a good speedup is obtained on large routing grids. 相似文献

16.

P-Prolog: A parallel logic language based on exclusive relation

Rong Yang Hideo Aiso 《New Generation Computing》1987,5(1):79-95

This paper presents a parallel logic programming language named P-Prolog which is being developed as a logic programming language featuring both and- and or-parallelism. Compared with the other parallel logic programming languages, syntactic constructs such as read-only annotation,⁶⁾ mode declaration²⁾ and communication constraints⁷⁾ are not used in P-Prolog. A new concept introduced in P-Prolog is the exclusive relation of guarded Horn clauses. Advances included in P-prolog. are:

The synchronization mechanism can determine the direction of data flow dynamically.
Guarded Horn clauses can be interpreted as eitherdon’t care nondeterminism ordon’t know non-determinism.

A prototype interpreter of P-Prolog has been implemented in C-Prolog. We are now implementing a P-Prolog interpreter in the C language. 相似文献

17.

An expert system on a programmable logic chip with an architecture accommodating a large number of inputs

James N. Siddall Peng Lu Jozef Verhaeghe 《Expert Systems》1992,9(2):67-77

相似文献

18.

A parallel architecture for discrete relaxation algorithm

Gu J Wang W Henderson TC 《IEEE transactions on pattern analysis and machine intelligence》1987,(6):816-831

Discrete relaxation techniques have proven useful in solving a wide range of problems in digital signal and digital image processing, artificial intelligence, operations research, and machine vision. Much work has been devoted to finding efficient hardware architectures. This paper shows that a conventional hardware design for a Discrete Relaxation Algorithm (DRA) suffers from O(n2m3) time complexity and O(n2m2) space complexity. By reformulating DRA into a parallel computational tree and using a multiple tree-root pipelining scheme, time complexity is reduced to O(nm), while the space complexity is reduced by a factor of 2. For certain relaxation processing, the space complexity can even be decreased to O(nm). Furthermore, a technique for dynamic configuring an architectural wavefront is used which leads to an O(n) time highly concurrent DRA3 architecture. 相似文献

19.

A note on a signature system based on probabilistic logic

Ernst Leiss 《Information Processing Letters》1980,11(2):110-113

相似文献

20.

A multicomputer garbage collector for a single-assignment language

Ian Foster 《International journal of parallel programming》1989,18(3):181-203

An asynchronous garbage collector for a message-passing multiprocessor (multicomputer) is described. This combines Weighted Reference Counting (WRC) interprocessor collection and tracing intraprocessor collection to permit individual processors to reclaim local storage independently. A novel feature is the integration of Weighted Reference Counting collection and the communication algorithms required to support a global address space in a single assignment language. This significantly reduces communication overhead and space requirements attributable to garbage collection. In addition, techniques are described that avoid the creation of cyclic structures that cannot be reclaimed using WRC. Experimental studies performed in a concurrent logic programming system that incorporates the collector confirm its efficiency and the benefits of integrating garbage collector and language implementation. 相似文献