首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
Adaptivevirtual cut-throughis considered as a viable alternative towormhole switchingfor fast and hardware-efficient interprocessor communication in multicomputers. Computer simulations are used to show that our implementation of a minimal-path fully-adaptive virtual cut-through algorithm outperforms both deterministic and adaptive wormhole switching methods under both uniform random message distributions and clustered distributions such as the matrix transpose. A hardware-efficient implementation of adaptive virtual cut-through has been implemented using a semi-custom-designed router chip that requires only 2.3% more area than a comparable deterministic wormhole router chip. A network interface controller chip, which is crucial to our adaptive virtual cut-through method, has also been designed and is under fabrication.  相似文献   

2.
Most multicomputer interconnection networks use wormhole switching, leading to fast and compact routers. Current routers incorporate virtual channels and even fully adaptive routing. Networks of workstations (NOWs) inherited multicomputer technology. Most commercial routers designed for NOWs implement wormhole switching. However, wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. However, buffer requirements and packetizing overhead prevented its widespread use in multicomputers. Nevertheless, wormhole and VCT switching require similar buffer capacity in NOWs. Moreover, some messaging layers such as Illinois Fast Messages (FM) and BIP split messages into packets for increased performance. Therefore, the traditional disadvantages of VCT switching disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole routers, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in a NOW environment. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost. Also, VCT routers require buffer capacity independent of wire length, making them suitable for networks of workstations.  相似文献   

3.
A VLSI microrchitecture of a network-on-chip (NoC) router with a wormhole cut-through switching method is presented in this paper. The main feature of the NoC router is that, the wormhole messages can be interleaved (cut-through) at flit-level in the same buffer pool and share communication links. Each flit belonging to the same message can track its routing paths correctly because a local identity-tag (ID-tag) is attached on each flit that varies over communication resources to support the wire-sharing message transportation. Flits belonging to the same message will have the same local ID-tag on each communication channel. The concept, on-chip microarchitecture, performance characteristics and interesting transient behaviors of the proposed NoC router that uses the wormhole cut-through switching method are presented in this paper. Routing engine module in the NoC architecture is an exchangeable module and must be designed in accordance with user specification i.e., static or adaptive routing algorithm. For quality of service purpose, inter-switch data transfers are controlled by using link-level overflow control to avoid drops of data.  相似文献   

4.
We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity.  相似文献   

5.
This work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully, adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade-off between the network performance and hardware cost. The outcome of this research is a high-performance adaptive router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input bufferring, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.  相似文献   

6.
This paper presents a theoretical framework for the design of deadlock-free fully adaptive routing algorithms for a general class of network topologies and switching techniques in a single, unified theory. A general theory is proposed that allows the design of deadlock avoidance-based as well as deadlock recovery-based wormhole and virtual cut-through adaptive routing algorithms that use a homogeneous or a heterogeneous (mixed) set of resources. The theory also allows channel queues to be allocated nonatomically, utilizing resources efficiently. A general methodology for the design of fully adaptive routing algorithms applicable to arbitrary network topologies is also proposed. The proposed theory and methodology allow the design of efficient network routers that require minimal resources for handling infrequent deadlocks  相似文献   

7.
A delay model for router microarchitectures   总被引:1,自引:0,他引:1  
This article introduces a router delay model that takes into account the pipelined nature of contemporary routers and proposes pipelines matched to the specific flow control method employed. Given the type of flow control and router parameters, the model returns router latency in technology-independent units and the number of pipeline stages as a function of cycle time. We apply this model to derive realistic pipelines for wormhole and virtual-channel routers and compare their performance. Contrary to the conclusions of previous models, our results show that the latency of a virtual channel router doesn't increase as we scale the number of virtual channels up to 8 per physical channel. Our simulation results also show that a virtual-channel router gains throughput of up to 40 % over a wormhole router  相似文献   

8.
网络虚拟化技术的提出,为解决互联网"僵化"问题找到了新的思路,受到广泛的关注。在虚拟路由器平台中,若干台互联的网络服务器资源组成了底层物理网络,通过虚拟网络映射技术,将物理网络资源有效地映射到虚拟网络设备上,组成多个虚拟网络,满足用户对网络的多样化需求。虚拟路由器资源映射问题是虚拟网络映射问题的基础,虚拟路由器实例与物理资源的映射方法决定了虚拟网络平台资源的利用率和虚拟网络系统的性能。针对虚拟路由器平台资源分配的问题,提出了物理网络资源模型和虚拟路由器资源请求模型,设计了一种启发式虚拟路由资源分配算法,并对算法的复杂性和优化目标进行了分析。  相似文献   

9.
互连网络路由器是MPP系统的关键部件,其性能优劣直接影响系统性能。路由器根据其所采用的路由算法可分为确定性和自适应路由器两种,其中自适应路由器有灵活性好,网络的通道利用率高和网络容错能力强等优点,正逐步为新一代的MPP系统所采用,但其工程实现难度较大。本文在mesh结构上,采用虫孔路由切换技术,给出了一个可扩展性好,自适应性强的基于平面的完全自适应路由算法PBFAA,并采用基于虚通道的综合流控策略  相似文献   

10.
A spate of deadlock avoidance-based and deadlock recovery-based routing algorithms have been proposed in recent years without full understanding of the likelihood and characteristics of actual deadlocks in interconnection networks. This work models the interrelationships between routing freedom, message blocking, correlated resource dependencies, and deadlock formation. It is empirically shown that increasing routing freedom, as achieved by allowing unrestricted routing over multiple physical and virtual channels, reduces the probability of deadlocks and the likelihood of other types of correlated message blocking that can degrade performance. Moreover, when true fully adaptive routing is used in k-ary n-cube networks with two or more virtual channels (wormhole OF virtual cut-through switched), it is empirically shown that deadlocks are virtually eliminated in networks with n⩾2. These results indicate that deadlocks are very infrequent when the network and routing algorithm inherently provide sufficient routing freedom, thus increasing the viability of deadlock recovery routing strategies  相似文献   

11.
This paper develops the theoretical background for the design of deadlock-free adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, developing conditions to verify that an adaptive algorithm is deadlock-free, even when there are cyclic dependencies between routing resources. Moreover, we propose a necessary and sufficient condition for deadlock-free routing. Also, a design methodology is proposed. It supplies fully adaptive, minimal and non-minimal routing algorithms, guaranteeing that they are deadlock-free. The theory proposed in this paper extends the necessary and sufficient condition for wormhole switching previously proposed by us. The resulting routing algorithms are more flexible than the ones for wormhole switching. Also, the design methodology is much easier to apply because it automatically supplies deadlock-free routing algorithms  相似文献   

12.
Compressionless routing (CR) is an adaptive routing framework which provides a unified framework for efficient deadlock free adaptive routing and fault tolerance. CR exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. Fault tolerant compressionless routing (FCR) extends CR to support end to end fault tolerant delivery. Detailed routing algorithms, implementation complexity, and performance simulation results for CR and FCR are presented. These results show that the hardware for CR and FCR networks is modest. Further, CR and FCR networks can achieve superior performance to alternatives such as dimension order routing. Compressionless routing has several key advantages: deadlock free adaptive routing in toroidal networks with no virtual channels, simple router designs, order preserving message transmission, applicability to a wide variety of network topologies, and elimination of the need for buffer allocation messages. Fault tolerant compressionless routing has several additional advantages: data integrity in the presence of transient faults (nonstop fault tolerance), permanent fault tolerance, and elimination of the need for software buffering and retry for reliability. The advantages of CR and FCR not only simplify hardware support for adaptive routing and fault tolerance, they also can simplify software communication layers  相似文献   

13.
自适应路由可以有效地提高片上网络性能,却导致网络中的数据传输乱序.设计了一个两级流水虚拟通道虫孔交换路由器,通过修改数据包的标记位和路由计算单元,使路由器支持确定性和自适应路由算法,简化了数据传输乱序问题;同时,将流经路由器的数据流分为东西和南北2个部分;在此基础上从经典的部分自适应路由出发,增加虚拟通道允许原本禁止的转向,实现了无死锁的自适应路由,并降低了网络延迟与硬件开销.  相似文献   

14.
Network-on-Chip (NoC) is widely used as a communication scheme in modern many-core systems. To guarantee the reliability of communication, effective fault tolerant techniques are critical for an NoC. In this paper, a novel fault tolerant architecture employing redundant routers is proposed to maintain the functionality of a network in the presence of failures. This architecture consists of a mesh of 2 × 2 router blocks with a spare router placed in the center of each block. This spare router provides a viable alternative when a router fails in a block. The proposed fault-tolerant architecture is therefore referred to as a quad-spare mesh. The quad-spare mesh can be dynamically reconfigured by changing control signals without altering the underlying topology. This dynamic reconfiguration and its corresponding routing algorithm are demonstrated in detail. Since the topology after reconfiguration is consistent with the original error-free 2D mesh, the proposed design is transparent to operating systems and application software. Experimental results show that the proposed design achieves significant improvements on reliability compared with those reported in the literature. Comparing the error-free system with a single router failure case, the throughput only decreases by 5.19% and latency increases by 2.40%, with about 45.9% hardware redundancy.  相似文献   

15.
Several recent studies have shown that adaptive routing algorithms based on deadlock recovery have superior performance characteristics than those based on deadlock avoidance. Most of these studies, however, have relied on software simulation due to the lack of analytical modelling tools. In an effort towards filling this gap, this paper presents a new analytical model of compressionless routing in wormhole-routed hypercubes. This routing algorithm exploits the tight coupling between wormhole routers for flow control to detect and recover from potential deadlock situations. The advantages of compressionless routing include deadlock-free adaptive routing with no extra virtual channels, simple router design, and order-preserving message transmission. The proposed analytical model computes message latency by determining the message transmission time, blocking delay at each router, multiplexing delay at each network channel, and waiting time in the source before entering the network. The validity of the model is demonstrated by comparing analytical results with those obtained through simulation experiments.  相似文献   

16.
With the increasing use of clusters in real-time applications, it has become essential to design high-performance networks with quality-of-service (QoS) guarantees. We explore the feasibility of providing QoS in wormhole switched routers, which are widely used in designing scalable, high-performance cluster interconnects. In particular, we are interested in supporting multimedia video streams with CBR and VBR traffic, in addition to the conventional best-effort traffic. The proposed MediaWorm router uses a rate-based bandwidth allocation mechanism, called Fine-Grained VirtualClock (FGVC), to schedule network resources for different traffic classes. Our simulation results on an 8-port router indicate that it is possible to provide jitter-free delivery to VBR/CBR traffic up to an input load of 70-80 percent of link bandwidth and the presence of best-effort traffic has no adverse effect on real-time traffic. Although the MediaWorm router shows a slightly lower performance than a pipelined circuit switched (PCS) router, commercial success of wormhole switching, coupled with simpler and cheaper design, makes it an attractive alternative. Simulation of a (2/spl times/2) fat-mesh using this router shows performance comparable to that of a single switch and suggests that clusters designed with appropriate bandwidth balance between links can provide required performance for different types of traffic.  相似文献   

17.
18.
虚网叠加构造自适应路由算法的有效框架   总被引:2,自引:0,他引:2  
大规模并行处理机系统中路由算法对互联网络通信性能和系统性起着重要作用。  相似文献   

19.
Asynchronous quasi-delay-insensitive (QDI) NoCs have several advantages over their clocked counterparts. Virtual channel (VC) is the most utilized flow control method in asynchronous routers but spatial division multiplexing (SDM) achieves better throughput performance for best-effort traffic than VC. A novel asynchronous SDM router architecture is presented. Area and latency models are provided to analyse the network performance of all router architectures including wormhole, virtual channel and SDM. Performance comparisons have been made with different configurations of payload size, communication distance, buffer size, port bandwidth, network size and number of VCs/virtual circuits. Compared with VC, SDM achieves higher throughput with lower area overhead.  相似文献   

20.
All-to-all (ATA) reliable broadcast is the problem of reliably distributing information from every node to every other node in point-to-point interconnection networks. A good solution to this problem is essential for clock synchronization, distributed agreement, etc. We propose a novel solution in which the reliable broadcasts from individual nodes are interleaved in such a manner that no two packets contend for the same link at any given time-this type of method is particularly suited for systems which use virtual cut-through or wormhole routing for fast communication between nodes. Our solution, called the IHC Algorithm, can be used on a large class of regular interconnection networks including regular meshes and hypercubes. By adjusting a parameter η referred to as the interleaving distance, we can flexibly decrease the link utilization of the IHC algorithm (for normal traffic) at the expense of an increase in the time required for ATA reliable broadcast. We compare the IHC algorithm to several other possible virtual cut-through solutions and a store-and-forward solution. The IHC algorithm with the minimum value of η is shown to be optimal in minimizing the execution time of ATA reliable broadcast when used in a dedicated mode (with no other network traffic)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号