首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The growing importance of expert systems in real-time applications reveals the necessity of reducing response times. Since monoprocessor optimizations of production systems have widely been explored, only multiple processor architectures appear to provide further performance gain. Efficient exploitation of the inherent parallelism of production systems, however, requires suitable algorithms for load balancing without simultaneously increasing organization or communication overhead. We present a novel parallel algorithm for PAMELA expert systems, based on dynamic distribution of data processing. The concept is supported by a transputer based architecture with an advanced interconnection structure.  相似文献   

2.
A new reconfigurable systolic multicomputer architecture is presented. The proposed architecture, called the Cylindrical Banyan Multicomputer (CBM), is based on the structure of a modified banyan network where every node of the network graph is composed of an application processor, a local memory and a communication processor, and network's inputs and outputs are merged (fused). The CBM has one of the lowest (cost) X (delay) among known multicomputer architectures based on regular networks. It is shown that a variety of computation structures such as pipelines, rings, and trees may be constructed and reconfigured in an optimal or a nearby optimal way on the CBM architecture, and that various basic algorithms can be executed very efficiently in a systolic manner. It is also shown that the CBM is an easily diagnosable and fault-tolerant system.  相似文献   

3.
The main contribution of this work is to propose an efficient parallel prefix sums architecture based on the recently-developed technique of shift switching with domino logic, where the charge/discharge signals propagate along the switch chain producing semaphores in a network that is fast and highly hardware-compact. The proposed architecture for computing the prefix sums of N-1 bits features a total delay of (4 log N + /spl radic/N-2)/sub */T/sub d/, where T/sub d/ is the delay for charging or discharging a row of two prefix sum units of eight shift switches. Our simulation results show that, under 0.8-micron CMOS technology, the delay T/sub d/ does not exceed 1 ns. As it turns out, our design is faster than any design known to us for values on N in the range 1 /spl les/ N /spl les/ 2/sup 10/. Yet, another important and novel feature of the proposed architecture is that it requires very simple controls, partially driven by the semaphores. This significantly reduces the hardware complexity of the design and fully utilizes the inherent speed of the process.  相似文献   

4.
Amnon Barak  Ami Litman 《Software》1985,15(8):725-737
This paper describes the goals and the internal structure of MOS, a Multicomputer distributed Operating System. MOS is a general-purpose time-sharing operating system which makes a cluster of loosely connected independent homogeneous computers behave as a single-machine UNIX system. The main goals of the system include network transparency, decentralized control, site autonomy and dynamic process migration. The main objective in the design of the system was to reduce the complexity of the system, while maintaining good performance. The internal structure of the system can be characterized by modularity, a high degree of information hiding, hierarchical organization and remote procedure calls.  相似文献   

5.
This paper begins by describing BSL, a new logic programming language fundamentally different from Prolog. BSL is a nondeterministic Algol-class language whose programs have a natural translation to first order logic; executing a BSL program without free variables amounts to proving the corresponding first order sentence. A new approach is proposed for parallel execution of logic programs coded in BSL, that relies on advanced compilation techniques for extracting fine grain parallelism from sequential code. We describe a new “Very Long Instruction Word” (VLIW) architecture for parallel execution of BSL programs. The architecture, now being designed at the IBM Thomas J. Watson Research Center, avoids the synchronization and communication delays (normally associated with parallel execution of logic programs on multiprocessors), by determining data dependences between operations at compile time, and by coupling the processing elements very tightly, via a single central shared register file. A simulator for the architecture has been implemented and some simulation results are reported in the paper, which are encouraging.  相似文献   

6.
A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example.  相似文献   

7.
Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer.  相似文献   

8.
Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented  相似文献   

9.
The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. Based on the measurement results, both the computation and communication behavior of these parallel programs were investigated. The various time interval distributions were modeled by statistical functions which were verified by a nonlinear regression technique using the empirical data. The temporal and spatial localities of message destinations were also studied. A model for the temporal locality of message length was introduced and used to analyze the communication traces. A trace-drive simulation environment, which uses the communication patterns of the parallel programs as inputs, was developed to study the behavior of the communication hardware under real workload. Simulation results on DMA and link utilizations are reported  相似文献   

10.
通过综合考虑企业生产过程中的工程复杂性和社会复杂性,平行管理为企业的综合管理和预测管理提供1种很有前景的理论方法,然而平行管理的软件技术架构体系需进一步完善.本文基于SOA技术,提出平行管理信息系统的概念、系统架构及其与现有信息系统的交互方式.结合乙烯生产系统,将平行管理信息系统分为协同层、应用层、服务层、组件层和资源层,在服务层和组件层构建和真实乙烯系统相一致的人工乙烯系统,在协同层和应用层通过流程编排和服务组合实现企业管理的计算和优化.最后以乙烯员工行为评估为例,说明了系统的构建过程.与传统的管理信息系统不同,平行管理信息系统将虚拟系统与实际信息系统互联互通,并融入了人的行为和管理制度等社会性因素统筹考虑.系统具有开放性、可灵活重构的特点,能快速响应企业业务变化和实验设计需求,为复杂生产系统的试验、培训、优化等多个管理目标提供了可行的技术实现手段.  相似文献   

11.
Amnon Barak  Amnon Shiloh 《Software》1985,15(9):901-913
This paper deals with the organization of a distributed load-balancing policy for a multicomputer system which consists of a cluster of independent computers that are interconnected by a local area communication network. We introduce three algorithms necessary to maintain load balancing in this system: the local load algorithm, used by each processor to monitor its own load; the exchange algorithm, for exchanging load information between the processors, and the process migration algorithm that uses this information to dynamically migrate processes from overloaded to underloaded processors. The policy that we present is distributed, i.e. each processor uses the same policy. It is both dynamic, responding to load changes without using an a priori knowledge of the resources that each process requires; and stable, unnecessary overloading of a processor is minimized. We give the essential details of the implementation of the policy and initial results on its performance. Our results confirm the feasibility of building distributed systems that are based on network communication for uniform access, resource sharing and improved reliability, as well as the use of workstations without a secondary storage device.  相似文献   

12.
Parallel processing is one approach to achieving the large computational processing capabilities required by many real-time computing tasks. One of the problems that must be addressed in the use of reconfigurable multiprocessor systems is matching the architecture configuration to the algorithms to be executed. This paper presents a conceptual model that explores the potential of artificial intelligence tools, specifically expert systems, to design an Intelligent Operating System for multiprocessor systems. The target task is the implementation of image understanding systems on multiprocessor architectures. PASM is used as an example multiprocessor. The Intelligent Operating System concepts developed here could also be used to address other problems requiring real-time processing. An example image understanding task is presented to illustrate the concept of intelligent scheduling by the Intelligent Operating System. Also considered is the use of the conceptual model when developing an image understanding system in order to test different strategies for choosing algorithms, imposing execution order constraints, and integrating results from various algorithms.  相似文献   

13.
A parallel-execution model that can concurrently exploit AND and OR parallelism in logic programs is presented. This model employs a combination of techniques in an approach to executing logic problems in parallel, making tradeoffs among number of processes, degree of parallelism, and combination bandwidth. For interpreting a nondeterministic logic program, this model (1) performs frame inheritance for newly created goals, (2) creates data-dependency graphs (DDGs) that represent relationships among the goals, and (3) constructs appropriate process structures based on the DDGs. (1) The use of frame inheritance serves to increase modularity. In contrast to most previous parallel models that have a large single process structure, frame inheritance facilitates the dynamic construction of multiple independent process structures, and thus permits further manipulation of each process structure. (2) The dynamic determination of data dependency serves to reduce computational complexity. In comparison to models that exploit brute-force parallelism and models that have fixed execution sequences, this model can reduce the number of unification and/or merging steps substantially. In comparison to models that exploit only AND parallelism, this model can selectively exploit demand-driven computation, according to the binding of the query and optional annotations. (3) The construction of appropriate process structures serves to reduce communication complexity. Unlike other methods that map DDGs directly onto process structures, this model can significantly reduce the number of data sent to a process and/or the number of communication channels connected to a process  相似文献   

14.
The results of a study of a family of parallel symbolic architectures executing several parallel applications are presented. The class of architectures being simulated is characterized by a shared memory structure, by a hierarchical interconnect, and by clustered processors. Speedup measurements were obtained from six different application kernels. Measurements were also performed to assess the degradation of speedup as a function of the interconnection delays, and to study the effect of different scheduling algorithms. The results presented support the claim that the proposed architecture would be a powerful parallel symbolic computation system. The paper discusses processor starvation, fine grain parallelism, unever loads, foreign reference, schedule and indeterminate computation with respect to the applications chosen.This work was completed within the Advanced Computer Architecture Program, Micro-electronics and Technology Computer Corporation, Austin, Texas.  相似文献   

15.
The implementation of Lee's maze routing algorithm on an MIMD hypercube multiprocessor computer can follow several plausible mappings and synchronization strategies. These are evaluated experimentally on an NCUBE/7 hypercube computer with 64 processors. Different grid partitioning and mapping strategies result in a different balance between computation and communication time. The total routing time is significantly impacted by the synchronization and termination detection scheme used. Further, by rearranging the computation, it is possible to overlap much of the interprocessor communication with the computation and realize a significant reduction in the overall run time. By choosing the right partitioning and synchronization scheme and by overlapping computation and communication, a good speedup is obtained on large routing grids.  相似文献   

16.
17.
This paper presents a parallel logic programming language named P-Prolog which is being developed as a logic programming language featuring both and- and or-parallelism. Compared with the other parallel logic programming languages, syntactic constructs such as read-only annotation,6) mode declaration2) and communication constraints7) are not used in P-Prolog. A new concept introduced in P-Prolog is the exclusive relation of guarded Horn clauses. Advances included in P-prolog. are:
  1. The synchronization mechanism can determine the direction of data flow dynamically.
  2. Guarded Horn clauses can be interpreted as eitherdon’t care nondeterminism ordon’t know non-determinism.
A prototype interpreter of P-Prolog has been implemented in C-Prolog. We are now implementing a P-Prolog interpreter in the C language.  相似文献   

18.
传统数据挖掘关联规则Apriori算法直接移植到云计算平台,数据挖掘效率虽然有了数量级的提升,但由于需要频繁地扫描事务数据库,增加了系统I/O、内存和通信的开销。提出一种基于矩阵的并行关联规则算法Apriori_MMR,该算法结合了数据划分的思想进行并行化改进,简化了生成候选项的连接步骤,仅需对事务数据库扫描两次,同时在计算过程中还能对事务进行压缩从而进一步提高了算法的性能。通过两种算法在不同数据规模下算法性能对比分析实验和两种算法在相同数据集不同节点数对比实验,共同验证了Apriori_MMR的运算效率至少要比Apriori_MR高出两倍左右,且设置的支持度阈值越小,效果愈明显。  相似文献   

19.
罗丹  周波 《计算机应用》2011,31(2):562-564
面向服务的体系架构(SOA)为遗留系统的再工程提供了解决方案,使得遗留系统可以支持分布式应用环境,但是由于技术的陈旧和架构的局限性,无法支持多线程、并行处理以及内存泄露等问题依旧在部分遗留系统中存在,极大地限制了它们的应用。为了解决这几个问题,通过深入分析研究Windows 通信基础(WCF)的通信机制,提出了一种并行架构,对WCF的基本架构进行了改造,即在默认的体系架构中添加一层服务控制器,用来在客户端和服务端之间传递消息和选择服务,很好地解决了这几个问题,并在某大型金融软件中得到了应用。  相似文献   

20.
This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号