期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hybrid closed queuing network model for multi-threaded dataflow architecture 总被引：1，自引：0，他引：1

Vidhyacharan 《Computers & Electrical Engineering》2005,31(8):556-571

In this paper, a closed queuing network model with both single and multiple servers has been proposed to model dataflow in a multi-threaded architecture. Multi-threading is useful in reducing the latency by switching among a set of threads in order to improve the processor utilization. Two sets of processors, synchronization and execution processors exist. Synchronization processors handle load/store operations and execution processors handle arithmetic/logic and control operations. A closed queuing network model is suitable for large number of job arrivals. The normalization constant is derived using a recursive algorithm for the given model. State diagrams are drawn from the hybrid closed queuing network model, and the steady-state balance equations are derived from it. Performance measures such as average response times and average system throughput are derived and plotted against the total number of processors in the closed queuing network model. Other important performance measures like processor utilizations, average queue lengths, average waiting times and relative utilizations are also derived. 相似文献

2.

Scheduling divisible loads on heterogeneous linear daisy chain networks with arbitrary processor release times

Veeravalli B. Wong Han Min 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(3):273-288

The problem of distributing and processing a divisible load in a heterogeneous linear network of processors with arbitrary processors release times is considered. A divisible load is very large in size and has computationally intensive CPU requirements. Further, it has the property that the load can be partitioned arbitrarily into any number of portions and can be scheduled onto processors independently for computation. The load is assumed to arrive at one of the farthest end processors, referred to as boundary processors, for processing. The processors in the network are assumed to have nonzero release times, i.e., the time instants from which the processors are available for processing the divisible load. Our objective is to design a load distribution strategy by taking into account the release times of the processors in such a way that the entire processing time of the load is a minimum. We consider two generic cases in which all processors have identical release times and when all processors have arbitrary release times. We adopt both the single and multiinstallment strategies proposed in the divisible load scheduling literature in our design of load distribution strategies, wherever necessary, to achieve a minimum processing time. Finally, when optimal strategies cannot be realized, we propose two heuristic strategies, one for the identical case, and the other for nonidentical release times case, respectively. Several conditions are derived to determine whether or not optimal load distribution exists and illustrative examples are provided for the ease of understanding. 相似文献

3.

网络处理器中处理单元的设计与实现

下载免费PDF全文

李诚李华伟《计算机工程》2007,33(2):252-254

随着网络带宽的飞速增长和各种新的网络应用不断涌现，原有的基于通用处理器和ASIC的互联网架构已经不能满足新的需求。兼具强大处理能力和灵活可编程配置能力的网络处理器逐渐得到广泛的应用。高性能的网络处理器通常采用多个并发的处理单元进行数据平面的快速处理，这些处理单元在网络处理器中居于核心的地位。该文讨论了网络处理器中处理单元设计需要考虑的因素，设计了一种较为灵活有效的处理单元架构，并进行了FPGA原型验证，证实了该结构的可行性。相似文献

4.

基于多处理机的混合实时任务容错调度 总被引：13，自引：1，他引：13

阳春华桂卫华计莉《计算机学报》2003,26(11):1479-1486

提出了一种混合实时任务容错调度算法．该算法采用Rate Monotonic(RM)算法完成周期任务的静态调度；采用预订处理机时间方法和Earlier Deadline First(EDF)算法动态调度非周期任务；采用主／副版本备份技术确保系统的容错能力．通过充分利用周期任务的剩余处理机时间调度非周期任务和主动备份与被动备份相结合的方法有效地减少了处理机数．仿真结果证明了算法的有效性．相似文献

5.

Optimizing computing costs using divisible load analysis

Jeeho Sohn Robertazzi T.G. Luryi S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(3):225-234

A bus oriented network where there is a charge for the amount of divisible load processed on each processor is investigated. A cost optimal processor sequencing result is found which involves assigning load to processors in nondecreasing order of the cost per load characteristic of each processor. More generally, one can trade cost against solution time. Algorithms are presented to minimize computing cost with an upper bound on solution time and to minimize solution time with an upper bound on cost. As an example of the use of this type of analysis, the effect of replacing one fast but expensive processor with a number of cheap but slow processors is also discussed. The type of questions investigated here are important for future computer utilities that perform distributed computation for some charge 相似文献

6.

基于Transputer的中粒度多任务管理的研究 总被引：1，自引：0，他引：1

陈勇刘心松《小型微型计算机系统》1995,16(10):6-11

Ｔｒａｎｓｐｕｔｅｒ是一种特别适于并行处理的处理器芯片，但是由于缺乏系统支撑软件使得对其开发显得很不方便。ＭＧＰＯＳ是一个基于Ｔｒａｎｓｐｕｔｅｒ网络的单用户多任务并行处理操作系统，它支持现有Ｔｒａｎｓｐｕｔｅｒ的ＯＣＣＡＭ程序设计模型，同时允许在装载时根据硬件资源情况对任务进行分配，此外所提供的存储管理接口和任务通信接口等功能为应用程序的开发提供了良好的基础。本文着重描述了该操作系统任务管理的设计思想及其关键技术，同时对其所特有的一些性能也进行了说明。该平台的建立为并行处理技术的进一步研究提供了有力的支持。相似文献

7.

PLUM : Parallel Load Balancing for Adaptive Unstructured Meshes

Leonid Oliker Rupak Biswas 《Journal of Parallel and Distributed Computing》1998,52(2):75

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method calledPLUMto dynamically balance the processor workloads with a global view. This paper describes the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate thatPLUMis an effective dynamic load balancing strategy which remains viable on a large number of processors. 相似文献

8.

Optimal sequencing and arrangement in distributed single-level treenetworks with communication delays

Bharadwaj V. Ghose D. Mani V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):968-976

The problem of obtaining optimal processing time in a distributed computing system consisting of (N+1) processors and N communication links, arranged in a single-level tree architecture, is considered. It is shown that optimality can be achieved through a hierarchy of steps involving optimal load distribution, load sequencing, and processor-link arrangement. Closed-form expressions for optimal processing time is derived for a general case of networks with different processor speeds and different communication link speeds. Using these closed-form expressions, the paper analytically proves a number of significant results that in earlier studies were only conjectured from computational results. In addition, it also extends these results to a more general framework. The above analysis is carried out for the cases in which the root processor may or may not be equipped with a front-end processor. Illustrative examples are given for all cases considered 相似文献

9.

An incentive-based distributed mechanism for scheduling divisible loads in tree networks

T.E. Carroll D. Grosu 《Journal of Parallel and Distributed Computing》2012

The underlying assumption of Divisible Load Scheduling (DLS) theory is that the processors composing the network are obedient, i.e., they do not “cheat” the scheduling algorithm. This assumption is unrealistic if the processors are owned by autonomous, self-interested organizations that have no a priori motivation for cooperation and they will manipulate the algorithm if it is beneficial to do so. In this paper, we address this issue by designing a distributed mechanism for scheduling divisible loads in tree networks, called DLS-T, which provides incentives to processors for reporting their true processing capacity and executing their assigned load at full processing capacity. We prove that the DLS-T mechanism computes the optimal allocation in an ex post Nash equilibrium. Finally, we simulate and study the mechanism under various network structures and processor parameters. 相似文献

10.

Parallelizing the Data Cube 总被引：1，自引：0，他引：1

Frank Dehne Todd Eavis Susanne Hambrusch Andrew Rau-Chaplin 《Distributed and Parallel Databases》2002,11(2):181-201

This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p. 相似文献

11.

网络处理器负载均衡算法综述

孔大伟李丹丹余建华 YU Jian-hua 《自动化技术与应用》2007,26(7):45-48

本文针对网络处理器中多个处理单元的负载均衡方法展开了讨论,详细介绍了多种负载均衡方法,给出了网络处理器负载均衡的特点和性能度量标准,提出了该领域进一步研究方向和基本思路.对同类研究有一定的帮助. 相似文献

12.

Performance characterization of multi-thread and multi-core processors based XML application oriented networking systems

Jason Jianxun Ding Abdul Waheed Jingnan Yao Laxmi N. Bhuyan 《Journal of Parallel and Distributed Computing》2010

There is a growing trend to insert application intelligence into network devices. Processors in this type of Application Oriented Networking (AON) devices are required to handle both packet-level network I/O intensive operations as well as XML message-level CPU intensive operations. In this paper, we investigate the performance effect of symmetric multi-processing (SMP) via (1) hardware multi-threading, (2) uni-processor to dual-processor architectures, and (3) single to dual and quad core processing, on both packet-level and XML message-level traffic. We use AON systems based on Intel Xeon processors with hyperthreading, Pentium M based dual-core processors, and Intel’s dual quad-core Xeon E5335 processors. We analyze and cross-examine the SMP effect from both highlevel performance as well as processor microarchitectural perspectives. The evaluation results will not only provide insight to microprocessor designers, but also help system architects of AON types of device to select the right processors. 相似文献

13.

Applying distributed cognition theory to the redesign of the ‘Copy and Paste’ function in order to promote appropriate learning outcomes

Michael Morgan Gwyn Brickell Barry Harper 《Computers & Education》2008

This paper explores the application of distributed cognition theory to educational contexts by examining a common learning interaction, the ‘Copy and Paste’ function. After a discussion of distributed cognition and the role of mediating artefacts in real world cognitions, the ‘Copy and Paste’ function is redesigned to embed an effective interaction strategy, based on encoding strategies, into the interface. The current affordances of the ‘Copy and Paste’ interaction derived from its business heritage (speed and accuracy of reproduction) are contrasted with those needed for a learning interaction (the meaningful processing of content for understanding). An empirical study was conducted to test the efficacy of the redesigned function through an experimental treatment. The study examined the impact of an experimental treatment based on changes to the ‘Copy and Paste’ function in terms of:

(a): changes to interaction strategies employed by learners; 相似文献

14.

基于网络通讯处理器的路由器设计 总被引：1，自引：0，他引：1

下载免费PDF全文

何亦征王晶岩谢文涛《计算机工程与科学》2001,23(4):76-77

网络通讯处理器是为提高报文处理效率而出现的专用处理器,本文描述了网络处理器的基本体系结构,并以Motorola通讯处理器MPC860为例阐述了路由器的实现方法。相似文献

15.

Parallel computers for region-level image processing

Azriel Rosenfeld Angela Y. Wu 《Pattern recognition》1982,15(1):41-50

It is well known that parallel computers can be used very effectively for image processing at the pixel level, by assigning a processor to each pixel or block of pixels, and passing information as necessary between processors whose blocks are adjacent. This paper discusses the use of parallel computers for processing images at the region level, assigning a processor to each region and passing information between processors whose regions are related. The basic difference between the pixel and region levels is that the regions (e.g. obtained by segmenting the given image) and relationships differ from image to image, and even for a given image, they do not remain fixed during processing. Thus, one cannot use the standard type of cellular parallelism, in which the set of processors and interprocessor connections remain fixed, for processing at the region level. Reconfigurable cellular computers, in which the set of processors that each processor can communicate with can change during a computation, are more appropriate. A class of such computers is described, and general examples are given illustrating how such a computer could initially configure itself to represent a given decomposition of an image into regions, and dynamically reconfigure itself, in parallel, as regions merge or split. 相似文献

16.

A real-time vision system using an integrated memory array processor prototype

Yoshihiro Fujita Nobuyuki Yamashita Shin'ichiro Okazaki 《Machine Vision and Applications》1994,7(4):220-228

This paper describes a real-time vision system (RVS) architecture and performance and its use of an integrated memory array processor (IMAP) prototype. This prototype integrates eight 8-bit processors and a 144-kbit SRAM on a single chip. The RVS was developed with 64 IMAP prototypes connected in series in a 512 processor-system configuration. A host workstation can access the memory on the IMAP prototypes directly through a random access port. Images are inputted and outputted at high speed through serial access ports. The RVS performance is shown in real-time road-image processing and in a neural network simulation, as well as in low-level image processing algorithms, such as filtering, histograms, discrete cosine transform (DCT), and rotation. The RVS image processing is shown to be much faster than the video rate. 相似文献

17.

地质现场信息的远程实时处理

郑大军李之彦《计算机工程》2001,27(6):63-64

介绍一种通过以地质现场处理机作为远程节点与其地的信息处理中心构成一个远程的无线网络来实现实时的、高度自动地质野外作业远程信息处理处理。该地质现场处理机装备有GPS定位卡,一个专用的GSM无线Modem及相关控制装置。该系统可控制野外作业的质量及大大缩短信息处理工作周期。相似文献

18.

Multiprocessor join scheduling

Murphy M.C. Rotem D. 《Knowledge and Data Engineering, IEEE Transactions on》1993,5(2):322-338

A practical join processing strategy that allows effective utilization of arbitrary degrees of parallelism in both the I/O subsystem and join processing subsystems is presented. Analytic bounds on the minimum execution time, minimum number of processors, and processor utilization are presented along with bounds on the execution time, given a fixed number of processors. These bounds assume that sufficient buffers are available. An analytic lower bound on buffer requirements as well as a practical heuristic for use in limited buffer environments are also presented. A sampling of corroborative simulation results are included 相似文献

19.

Performance Measures for Evaluating Algorithms for SIMD Machines

《IEEE transactions on pattern analysis and machine intelligence》1982,(4):319-331

This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors. 相似文献

20.

Supporting load distribution strategies in message-passing multiprocessors: a case study

Alberto Bartoli Gianluca Dini 《Microprocessors and Microsystems》1991,15(10):549-558

This paper describes a software architecture designed as a support for tackling the load distribution problem when solving complex problems on concurrent processors. We have considered transputer-based MIMD multiprocessors as concurrent processors and a simulator for biologically inspired neural networks as a case study. Biologically inspired neural networks are characterized by having many thousands of neurons and synapses and topologically based connection schemes. It has been our main aim to give the user the possibility of simply defining and modifying widely differing load distribution strategies, in order to make it possible to deal with a broad range of neural network architectures and processor topologies. Furthermore we provide a real tool for hiding communication delays. 相似文献