共查询到19条相似文献,搜索用时 374 毫秒
1.
随着高性能网络规模的增加,高阶路由器结构设计成为高性能计算研究的重点和热点。使用高阶路由器,网络能实现更低的报文传输延迟、网络功耗和网络构建成本,同时高阶路由器的应用还可以提高网络可靠性。高性能路由器的阶数不断提高,仅靠扩展单级crossbar交换结构的阶数使路由器内部的连线资源急速增长,交叉开关的实现代价将不可接受,这就需要为高阶路由器设计新型的交换结构。近十年来,出现了以YARC为代表的经典结构化设计以及"network within a network"等新型设计方法,未来的研究重点是解决高阶路由器结构设计中遇到的缓存、仲裁和扩展性等各种问题。鉴于此,实现了一种多级无缓存高阶路由器,这种高阶路由器内部是一个多级Clos网络,每一级有相应的仲裁模块对请求进行调度,数据包缓存在输入/输出端口实现,除去这些缓冲区单元,该网络是无缓存的。最后通过BookSim模拟器进行了大量的性能测试,所设计的路由器能够正常工作,性能良好。 相似文献
2.
3.
针对在非全互连三维片上网络(3D NoC)架构中的硅通孔(TSV)表只存储TSV地址信息,导致网络拥塞的问题,提出了记录表结构。该表不仅可以存储距离路由器最近的4个TSV地址,也可存储相应路由器输入缓存的占用和故障信息。在此基础上,又提出最短传输路径的自适应单播路由算法。首先,计算当前节点与目的节点的坐标确定数据包的传输方式;其次,检测传输路径是否故障,同时获取端口缓存占用信息;最后,确定最佳的传输端口,传输数据包到邻近路由器。两种网络规模下的实验结果表明,与Elevator-First算法相比,所提算法在平均延时和吞吐率性能指标上有明显的优势,且在网络故障率为50%时,Random和Shuffle流量模型下的丢包率分别为25.5%和29.5%。 相似文献
4.
针对因路由器内部输入缓存和交叉开关故障引起的可靠性及网络拥塞问题,提出一种故障感知的RVOQ容错架构设计方案.首先在输入端口处增加冗余虚通道进行输入缓存故障的容错设计,通过故障信息的反馈和仲裁算法使得数据选择有效的路径进行传输;然后修改交叉开关的架构,增加多路选择开关和相应控制模块,输入数据优先考虑本地数据链路,故障情况下选择冗余路径进行数据传输.实验结果表明,在故障数为3时,该方案比已有方法的时延降低了11%~53.1%;在网络出现多个故障、面临网络重负载时,仍然能够保证系统的高可靠性以及传输性能. 相似文献
5.
6.
为实现SpaceFibre标准协议中网络层的功能,提出一种以FPGA为核心的路由器设计方案.根据协议的规范,设计出5个端口的路由器,包括4个普通端口和1个配置端口,每个普通端口有4条虚拟通道.考虑到虚拟通道路由依然存在数据阻塞的可能,在交叉开关矩阵(CrossBar)结构上增添轮询仲裁的路由算法.用Verilog代码实现该路由器功能,使用XC6SLX9型号的FPGA进行Modelsim的仿真,验证了该设计方案的正确性和有效性. 相似文献
7.
研究了交换控制电路中基于输入缓冲的交换结构,提出了一种请求移位的方法处理输入缓冲和中央仲裁器之间的仲裁延时;输入缓冲交换结构的实现采用流水线方式,以减少数据发送请求和应答之间的延时。 相似文献
8.
使用分析模型对MIN的性能进行预测能够取得较好的精度,效率远高于模拟模型。本文深入分析了MIN交换单元输入端口状态之间的转换关系和相邻端口状态的相关性,对输入端口状态进行了重新定义,使用概率论和排队论的方法建立了一个精确的、支持多播的MIN性能分析模型。本文在端口状态转换中使用了条件概率,更加准确地反映了MIN执行行报文交换的实际操作过程。实验结果显示,使用本模型可以将延迟误差控制在10%以内,具有较高的精度。 相似文献
9.
10.
基于流映射的负载均衡调度算法研究 总被引:1,自引:0,他引:1
网络管理者需要能够提供可扩展性、吞吐率保证及报文顺序的高性能路由器体系结构.目前基于Crossbar的集中式路由器体系结构难以实现性能和规模的可扩展,基于两级Mesh网络的负载均衡交换结构成为扩展Internet路由器容量的有效的途径.负载均衡路由器存在严重的报文乱序现象,输出端报文重定序复杂度为O(N2).文中提出一种区域均等的负载均衡交换结构,每k个连续的中间级输入端口划分为一个区域,输入端采用基于流映射的负载分配算法UFFS-k(Uniform Fine-grain Frame Spreading,k为聚合粒度,简称UFFS-k),在k个连续的外部时间槽,以细粒度的方式将同一条流的k个信元分派到固定的映射区域,通过理论证明,该调度策略可获得100%吞吐率并能够保证报文的顺序.为避免流量区域集中现象,采用双循环(dual-rotation)方式构建不同输入端口的流到区域的映射关系;为实现负载在中间级输入端口的均衡分布,每个输入端口维护全局统一视图的流量分布矩阵,UFFS-k调度算法根据流量分布矩阵调度单位帧,可以证明,对任意输出端口j,同一区域OQj队列长度相同且不同区域OQj队列长度至多差1,从而实现了100%负载均衡度.UFFS-k调度算法分布于每个输入端口独立执行,根据流到区域的映射关系及负载分布状态分派信元,模拟结果显示,当聚合粒度k=2时,UFFS-k算法在同类维序算法中表现出最优延迟性能. 相似文献
11.
一种动态分配虚拟输出队列结构的片上路由器 总被引:1,自引:0,他引:1
传统虚通道流控技术的片上路由器通过增加虚通道缓解排头阻塞引起的链路吞吐率下降以及网络拥塞的同时,面临缓冲区低利用率、仲裁开销较大等问题.而动态虚通道流控的片上路由器虽可通过动态管理缓冲单元,提高缓冲区利用率与链路吞吐率,但却不可避免流控与仲裁逻辑复杂度与开销的快速增长.为了提高链路吞吐率与缓冲区利用率,获得较好的性能与开销折中,提出一种动态分配虚拟输出队列结构的片上路由器DAVOQ,该结构通过快速链表动态组织虚拟输出队列,同时使用超前路由机制以简化仲裁逻辑,优化流水线.模拟与综合的结果表明,相比传统虚通道路由器,DAVOQ路由器改善报文传输延迟与吞吐率的同时,在0.13μm CMOS工艺下,节省了15.1%的标准单元面积与12.9%的漏电流功耗;而相比动态虚通道路由器,DAVOQ路由器能够以较小的吞吐率损失获得可观的延迟改善,同时节约15.6%的标准单元面积与20.5%的漏电流功耗. 相似文献
12.
随着高性能网络规模的增加,高阶路由器结构设计成为高性能计算中研究的重点和热点。使用高阶路由器,网络能实现更低的报文传输延迟、网络构建成本和网络功耗,同时高阶路由器的应用还可以提高网络可靠性。过去十年是高阶路由器发展最快的时期,对近年高阶路由器的研究进行了综述,并对未来发展趋势进行了预测,主要介绍了以YARC为代表的经典结构化设计以及"network within a network"等近年来涌现的新型设计方法。未来的研究重点是解决高阶路由器结构设计中遇到的缓存和仲裁等各种问题,并利用光互连等技术设计性能更好的结构。 相似文献
13.
This paper proposes a novel QoS-aware and congestion-aware Network-on-Chip architecture that not only enables quality-oriented network transmission and maintains a feasible implementation cost but also well balance traffic load inside the network to enhance overall throughput. By differentiating application traffic into different service classes, bandwidth allocation is managed accordingly to fulfill QoS requirements. Incorporating with congestion control scheme which consists of dynamic arbitration and adaptive routing path selection, high priority traffic is directed to less congested areas and is given preference to available resources. Simulation results show that average latency of high priority and overall traffic is improved dramatically for various traffic patterns. Cost evaluation results also show that the proposed router architecture requires negligible cost overhead but provides better performance for both advanced mesh NoC platforms. 相似文献
14.
15.
16.
Performance and scalability are critically-important for on-chip interconnect in many-core chip-multiprocessor systems. Packet-switched interconnect fabric, widely viewed as the de facto on-chip data communication backplane in the many-core era, offers high throughput and excellent scalability. However, these benefits come at the price of router latency due to run-time multi-hop data buffering and resource arbitration. The network accounts for a majority of on-chip data transaction latency. In this work, we propose dynamic in-network resource reservation techniques to optimize run-time on-chip data transactions. This idea is motivated by the need to preserve existing abstraction and general-purpose network performance while optimizing for frequently-occurring network events such as data transactions. Experimental studies using multithreaded benchmarks demonstrate that the proposed techniques can reduce on-chip data access latency by 28.4% on average in a 16-node system and 29.2% on average in a 36-node system. 相似文献
17.
Mingche Lai Author VitaeLei GaoAuthor Vitae Sheng MaAuthor VitaeXiao NongAuthor Vitae Zhiying WangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):98-109
With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities. 相似文献
18.
PowerTrust: A Robust and Scalable Reputation System for Trusted Peer-to-Peer Computing 总被引:12,自引:0,他引:12
We propose a new fair scheduling technique, called OCGRR (output controlled grant-based round robin), for the support of DiffServ traffic in a core router. We define a stream to be the same-class packets from a given immediate upstream router destined to an output port of the core router. At each output port, streams may be isolated in separate buffers before being scheduled in a frame. The sequence of traffic transmission in a frame starts from higher-priority traffic and goes down to lower-priority traffic. A frame may have a number of small rounds for each class. Each stream within a class can transmit a number of packets in the frame based on its available grant, but only one packet per small round, thus reducing the intertransmission time from the same stream and achieving a smaller jitter and startup latency. The grant can be adjusted in a way to prevent the starvation of lower priority classes. We also verify and demonstrate the good performance of our scheduler by simulation and comparison with other algorithms in terms of queuing delay, jitter, and start-up latency 相似文献
19.
Rahbar Akbar Ghaffar Pour Yang Oliver 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(5):697-710
We propose a new fair scheduling technique, called OCGRR (Output Controlled Grant-based Round Robin), for the support of DiffServ traffic in a core router. We define a stream to be the same-class packets from a given immediate upstream router destined to an output port of the core router. At each output port, streams may be isolated in separate buffers before being scheduled in a frame. The sequence of traffic transmission in a frame starts from higher-priority traffic and goes down to lower-priority traffic. A frame may have a number of small rounds for each class. Each stream within a class can transmit a number of packets in the frame based on its available grant, but only one packet per small round, thus reducing the intertransmission time from the same stream and achieving a smaller jitter and startup latency. The grant can be adjusted in a way to prevent the starvation of lower priority classes. We also verify and demonstrate the good performance of our scheduler by simulation and comparison with other algorithms in terms of queuing delay, jitter, and start-up latency. 相似文献