共查询到20条相似文献,搜索用时 375 毫秒
1.
《Journal of Parallel and Distributed Computing》1998,52(1):82-95
Adaptivevirtual cut-throughis considered as a viable alternative towormhole switchingfor fast and hardware-efficient interprocessor communication in multicomputers. Computer simulations are used to show that our implementation of a minimal-path fully-adaptive virtual cut-through algorithm outperforms both deterministic and adaptive wormhole switching methods under both uniform random message distributions and clustered distributions such as the matrix transpose. A hardware-efficient implementation of adaptive virtual cut-through has been implemented using a semi-custom-designed router chip that requires only 2.3% more area than a comparable deterministic wormhole router chip. A network interface controller chip, which is crucial to our adaptive virtual cut-through method, has also been designed and is under fabrication. 相似文献
2.
《Journal of Parallel and Distributed Computing》2001,61(2):224-253
Most multicomputer interconnection networks use wormhole switching, leading to fast and compact routers. Current routers incorporate virtual channels and even fully adaptive routing. Networks of workstations (NOWs) inherited multicomputer technology. Most commercial routers designed for NOWs implement wormhole switching. However, wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. However, buffer requirements and packetizing overhead prevented its widespread use in multicomputers. Nevertheless, wormhole and VCT switching require similar buffer capacity in NOWs. Moreover, some messaging layers such as Illinois Fast Messages (FM) and BIP split messages into packets for increased performance. Therefore, the traditional disadvantages of VCT switching disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole routers, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in a NOW environment. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost. Also, VCT routers require buffer capacity independent of wire length, making them suitable for networks of workstations. 相似文献
3.
Faizal Arya SammanAuthor Vitae Thomas HollsteinAuthor VitaeManfred GlesnerAuthor Vitae 《Microprocessors and Microsystems》2011,35(3):343-358
A VLSI microrchitecture of a network-on-chip (NoC) router with a wormhole cut-through switching method is presented in this paper. The main feature of the NoC router is that, the wormhole messages can be interleaved (cut-through) at flit-level in the same buffer pool and share communication links. Each flit belonging to the same message can track its routing paths correctly because a local identity-tag (ID-tag) is attached on each flit that varies over communication resources to support the wire-sharing message transportation. Flits belonging to the same message will have the same local ID-tag on each communication channel. The concept, on-chip microarchitecture, performance characteristics and interesting transient behaviors of the proposed NoC router that uses the wormhole cut-through switching method are presented in this paper. Routing engine module in the NoC architecture is an exchangeable module and must be designed in accordance with user specification i.e., static or adaptive routing algorithm. For quality of service purpose, inter-switch data transfers are controlled by using link-level overflow control to avoid drops of data. 相似文献
4.
Yancang ChenAuthor Vitae Zhonghai LuAuthor Vitae 《Computers & Electrical Engineering》2012,38(4):906-916
We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity. 相似文献
5.
Puente V. Gregorio J.-A. Beivide R. Izu C. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(5):487-501
This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully, adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade-off between the network performance and hardware cost. The outcome of this research is a high-performance adaptive router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input bufferring, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors. 相似文献
6.
Duato J. Pinkston T.M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(12):1219-1235
This paper presents a theoretical framework for the design of deadlock-free fully adaptive routing algorithms for a general class of network topologies and switching techniques in a single, unified theory. A general theory is proposed that allows the design of deadlock avoidance-based as well as deadlock recovery-based wormhole and virtual cut-through adaptive routing algorithms that use a homogeneous or a heterogeneous (mixed) set of resources. The theory also allows channel queues to be allocated nonatomically, utilizing resources efficiently. A general methodology for the design of fully adaptive routing algorithms applicable to arbitrary network topologies is also proposed. The proposed theory and methodology allow the design of efficient network routers that require minimal resources for handling infrequent deadlocks 相似文献
7.
A delay model for router microarchitectures 总被引:1,自引:0,他引:1
This article introduces a router delay model that takes into account the pipelined nature of contemporary routers and proposes pipelines matched to the specific flow control method employed. Given the type of flow control and router parameters, the model returns router latency in technology-independent units and the number of pipeline stages as a function of cycle time. We apply this model to derive realistic pipelines for wormhole and virtual-channel routers and compare their performance. Contrary to the conclusions of previous models, our results show that the latency of a virtual channel router doesn't increase as we scale the number of virtual channels up to 8 per physical channel. Our simulation results also show that a virtual-channel router gains throughput of up to 40 % over a wormhole router 相似文献
8.
网络虚拟化技术的提出,为解决互联网"僵化"问题找到了新的思路,受到广泛的关注。在虚拟路由器平台中,若干台互联的网络服务器资源组成了底层物理网络,通过虚拟网络映射技术,将物理网络资源有效地映射到虚拟网络设备上,组成多个虚拟网络,满足用户对网络的多样化需求。虚拟路由器资源映射问题是虚拟网络映射问题的基础,虚拟路由器实例与物理资源的映射方法决定了虚拟网络平台资源的利用率和虚拟网络系统的性能。针对虚拟路由器平台资源分配的问题,提出了物理网络资源模型和虚拟路由器资源请求模型,设计了一种启发式虚拟路由资源分配算法,并对算法的复杂性和优化目标进行了分析。 相似文献
9.
10.
《Parallel and Distributed Systems, IEEE Transactions on》1999,10(9):904-921
A spate of deadlock avoidance-based and deadlock recovery-based routing algorithms have been proposed in recent years without full understanding of the likelihood and characteristics of actual deadlocks in interconnection networks. This work models the interrelationships between routing freedom, message blocking, correlated resource dependencies, and deadlock formation. It is empirically shown that increasing routing freedom, as achieved by allowing unrestricted routing over multiple physical and virtual channels, reduces the probability of deadlocks and the likelihood of other types of correlated message blocking that can degrade performance. Moreover, when true fully adaptive routing is used in k-ary n-cube networks with two or more virtual channels (wormhole OF virtual cut-through switched), it is empirically shown that deadlocks are virtually eliminated in networks with n⩾2. These results indicate that deadlocks are very infrequent when the network and routing algorithm inherently provide sufficient routing freedom, thus increasing the viability of deadlock recovery routing strategies 相似文献
11.
This paper develops the theoretical background for the design of deadlock-free adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, developing conditions to verify that an adaptive algorithm is deadlock-free, even when there are cyclic dependencies between routing resources. Moreover, we propose a necessary and sufficient condition for deadlock-free routing. Also, a design methodology is proposed. It supplies fully adaptive, minimal and non-minimal routing algorithms, guaranteeing that they are deadlock-free. The theory proposed in this paper extends the necessary and sufficient condition for wormhole switching previously proposed by us. The resulting routing algorithms are more flexible than the ones for wormhole switching. Also, the design methodology is much easier to apply because it automatically supplies deadlock-free routing algorithms 相似文献
12.
Kim J.H. Ziqiang Liu Chien A.A. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(3):229-244
Compressionless routing (CR) is an adaptive routing framework which provides a unified framework for efficient deadlock free adaptive routing and fault tolerance. CR exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. Fault tolerant compressionless routing (FCR) extends CR to support end to end fault tolerant delivery. Detailed routing algorithms, implementation complexity, and performance simulation results for CR and FCR are presented. These results show that the hardware for CR and FCR networks is modest. Further, CR and FCR networks can achieve superior performance to alternatives such as dimension order routing. Compressionless routing has several key advantages: deadlock free adaptive routing in toroidal networks with no virtual channels, simple router designs, order preserving message transmission, applicability to a wide variety of network topologies, and elimination of the need for buffer allocation messages. Fault tolerant compressionless routing has several additional advantages: data integrity in the presence of transient faults (nonstop fault tolerance), permanent fault tolerance, and elimination of the need for software buffering and retry for reliability. The advantages of CR and FCR not only simplify hardware support for adaptive routing and fault tolerance, they also can simplify software communication layers 相似文献
13.
自适应路由可以有效地提高片上网络性能,却导致网络中的数据传输乱序.设计了一个两级流水虚拟通道虫孔交换路由器,通过修改数据包的标记位和路由计算单元,使路由器支持确定性和自适应路由算法,简化了数据传输乱序问题;同时,将流经路由器的数据流分为东西和南北2个部分;在此基础上从经典的部分自适应路由出发,增加虚拟通道允许原本禁止的转向,实现了无死锁的自适应路由,并降低了网络延迟与硬件开销. 相似文献
14.
《Journal of Systems Architecture》2013,59(7):482-491
Network-on-Chip (NoC) is widely used as a communication scheme in modern many-core systems. To guarantee the reliability of communication, effective fault tolerant techniques are critical for an NoC. In this paper, a novel fault tolerant architecture employing redundant routers is proposed to maintain the functionality of a network in the presence of failures. This architecture consists of a mesh of 2 × 2 router blocks with a spare router placed in the center of each block. This spare router provides a viable alternative when a router fails in a block. The proposed fault-tolerant architecture is therefore referred to as a quad-spare mesh. The quad-spare mesh can be dynamically reconfigured by changing control signals without altering the underlying topology. This dynamic reconfiguration and its corresponding routing algorithm are demonstrated in detail. Since the topology after reconfiguration is consistent with the original error-free 2D mesh, the proposed design is transparent to operating systems and application software. Experimental results show that the proposed design achieves significant improvements on reliability compared with those reported in the literature. Comparing the error-free system with a single router failure case, the throughput only decreases by 5.19% and latency increases by 2.40%, with about 45.9% hardware redundancy. 相似文献
15.
Several recent studies have shown that adaptive routing algorithms based on deadlock recovery have superior performance characteristics than those based on deadlock avoidance. Most of these studies, however, have relied on software simulation due to the lack of analytical modelling tools. In an effort towards filling this gap, this paper presents a new analytical model of compressionless routing in wormhole-routed hypercubes. This routing algorithm exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. The advantages of compressionless routing include deadlock-free adaptive routing with no extra virtual channels, simple router design, and order-preserving message transmission. The proposed analytical model computes message latency by determining the message transmission time, blocking delay at each router, multiplexing delay at each network channel, and waiting time in the source before entering the network. The validity of the model is demonstrated by comparing analytical results with those obtained through simulation experiments. 相似文献
16.
Ki Hwan Yum Eun Jung Kim Das C.R. Vaidya A.S. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1261-1274
With the increasing use of clusters in real-time applications, it has become essential to design high-performance networks with quality-of-service (QoS) guarantees. We explore the feasibility of providing QoS in wormhole switched routers, which are widely used in designing scalable, high-performance cluster interconnects. In particular, we are interested in supporting multimedia video streams with CBR and VBR traffic, in addition to the conventional best-effort traffic. The proposed MediaWorm router uses a rate-based bandwidth allocation mechanism, called Fine-Grained VirtualClock (FGVC), to schedule network resources for different traffic classes. Our simulation results on an 8-port router indicate that it is possible to provide jitter-free delivery to VBR/CBR traffic up to an input load of 70-80 percent of link bandwidth and the presence of best-effort traffic has no adverse effect on real-time traffic. Although the MediaWorm router shows a slightly lower performance than a pipelined circuit switched (PCS) router, commercial success of wormhole switching, coupled with simpler and cheaper design, makes it an attractive alternative. Simulation of a (2/spl times/2) fat-mesh using this router shows performance comparable to that of a single switch and suggests that clusters designed with appropriate bandwidth balance between links can provide required performance for different types of traffic. 相似文献
17.
18.
19.
Asynchronous quasi-delay-insensitive (QDI) NoCs have several advantages over their clocked counterparts. Virtual channel (VC) is the most utilized flow control method in asynchronous routers but spatial division multiplexing (SDM) achieves better throughput performance for best-effort traffic than VC. A novel asynchronous SDM router architecture is presented. Area and latency models are provided to analyse the network performance of all router architectures including wormhole, virtual channel and SDM. Performance comparisons have been made with different configurations of payload size, communication distance, buffer size, port bandwidth, network size and number of VCs/virtual circuits. Compared with VC, SDM achieves higher throughput with lower area overhead. 相似文献
20.
All-to-all (ATA) reliable broadcast is the problem of reliably distributing information from every node to every other node in point-to-point interconnection networks. A good solution to this problem is essential for clock synchronization, distributed agreement, etc. We propose a novel solution in which the reliable broadcasts from individual nodes are interleaved in such a manner that no two packets contend for the same link at any given time-this type of method is particularly suited for systems which use virtual cut-through or wormhole routing for fast communication between nodes. Our solution, called the IHC Algorithm, can be used on a large class of regular interconnection networks including regular meshes and hypercubes. By adjusting a parameter η referred to as the interleaving distance, we can flexibly decrease the link utilization of the IHC algorithm (for normal traffic) at the expense of an increase in the time required for ATA reliable broadcast. We compare the IHC algorithm to several other possible virtual cut-through solutions and a store-and-forward solution. The IHC algorithm with the minimum value of η is shown to be optimal in minimizing the execution time of ATA reliable broadcast when used in a dedicated mode (with no other network traffic) 相似文献