期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

杨文祥董德尊李存禄雷斐孙凯旋吴际《计算机工程与科学》2017,39(2):245-251

随着高性能网络规模的增加,高阶路由器结构设计成为高性能计算研究的重点和热点。使用高阶路由器,网络能实现更低的报文传输延迟、网络功耗和网络构建成本,同时高阶路由器的应用还可以提高网络可靠性。高性能路由器的阶数不断提高,仅靠扩展单级crossbar交换结构的阶数使路由器内部的连线资源急速增长,交叉开关的实现代价将不可接受,这就需要为高阶路由器设计新型的交换结构。近十年来,出现了以YARC为代表的经典结构化设计以及"network within a network"等新型设计方法,未来的研究重点是解决高阶路由器结构设计中遇到的缓存、仲裁和扩展性等各种问题。鉴于此,实现了一种多级无缓存高阶路由器,这种高阶路由器内部是一个多级Clos网络,每一级有相应的仲裁模块对请求进行调度,数据包缓存在输入/输出端口实现,除去这些缓冲区单元,该网络是无缓存的。最后通过BookSim模拟器进行了大量的性能测试,所设计的路由器能够正常工作,性能良好。相似文献

2.

基于数据流向的片上网络动态缓存分配机制*

谢同飞韩国栋肖庆辉《计算机应用研究》2011,28(11):4251-4255

为了有效利用缓存资源,提出一种动态分配片上网络路由器端口缓存的方法,根据传输方向将输入端口接收到的数据分成不同的组,每个组对应一个输出端口,并将数据以组的形式进行存储,控制部件根据各个组数据规模为其动态分配缓存资源。与基于虚通道的动态缓存分配方式相比,该方法降低了控制和仲裁的复杂度。仿真结果表明,获得同等性能的条件下,该方法可以有效降低缓存的需求。相似文献

3.

面向非全互连3D NoC的自适应单播路由算法

孙美东刘勤让刘冬培燕昺昊《计算机应用》2018,38(5):1470-1475

针对在非全互连三维片上网络（3D NoC）架构中的硅通孔（TSV）表只存储TSV地址信息,导致网络拥塞的问题,提出了记录表结构。该表不仅可以存储距离路由器最近的4个TSV地址,也可存储相应路由器输入缓存的占用和故障信息。在此基础上,又提出最短传输路径的自适应单播路由算法。首先,计算当前节点与目的节点的坐标确定数据包的传输方式;其次,检测传输路径是否故障,同时获取端口缓存占用信息;最后,确定最佳的传输端口,传输数据包到邻近路由器。两种网络规模下的实验结果表明,与Elevator-First算法相比,所提算法在平均延时和吞吐率性能指标上有明显的优势,且在网络故障率为50%时,Random和Shuffle流量模型下的丢包率分别为25.5%和29.5%。相似文献

4.

3D NoC中故障感知的RVOQ容错架构设计

欧阳一鸣何敏梁华国汪秀敏常郝《计算机辅助设计与图形学学报》2015,(1)

针对因路由器内部输入缓存和交叉开关故障引起的可靠性及网络拥塞问题,提出一种故障感知的RVOQ容错架构设计方案.首先在输入端口处增加冗余虚通道进行输入缓存故障的容错设计,通过故障信息的反馈和仲裁算法使得数据选择有效的路径进行传输;然后修改交叉开关的架构,增加多路选择开关和相应控制模块,输入数据优先考虑本地数据链路,故障情况下选择冗余路径进行数据传输.实验结果表明,在故障数为3时,该方案比已有方法的时延降低了11%~53.1%;在网络出现多个故障、面临网络重负载时,仍然能够保证系统的高可靠性以及传输性能. 相似文献

5.

高效的Crossbar仲裁算法--ISP 总被引：12，自引：0，他引：12

孙志刚苏金树卢锡城《计算机学报》2000,23(10):1078-1082

交换开关是高性能路由器的核心,目前高性能骨干路由器一般采用基于输入队列的ｃｒｏｓｓｂａｒ交换开关。高效的ｃｒｏｓｓｂａｒ仲裁算法对路由器设计十分重要,文中提出一种轮询与ＲｏｕｎｄＲｏｂｉｎ相结合的仲裁算法－ＩＳＰ（ＩｎｐｕｔＳｅｒｉａｌＰｏｌｌｉｎｇ）。轻负载时ＩＳＰ算法与ｉＳＬＩＰ算法性能相当,重负载时ＩＳＰ算法在宽带利用率、信元平均延时和公平性等方面优于ｉＳＬＩＰ算法,ＩＳＰ算法实现简单相似文献

6.

SpaceFibre总线路由器的设计

赵允齐安军社郑静雅祝平臧文博《计算机工程与设计》2021,42(4):1195-1200,封3

为实现SpaceFibre标准协议中网络层的功能,提出一种以FPGA为核心的路由器设计方案.根据协议的规范,设计出5个端口的路由器,包括4个普通端口和1个配置端口,每个普通端口有4条虚拟通道.考虑到虚拟通道路由依然存在数据阻塞的可能,在交叉开关矩阵(CrossBar)结构上增添轮询仲裁的路由算法.用Verilog代码实现该路由器功能,使用XC6SLX9型号的FPGA进行Modelsim的仿真,验证了该设计方案的正确性和有效性. 相似文献

7.

基于输入缓冲的交换结构研究

石霄余长征王玉艳章建雄《计算机工程》2005,31(20):231-232,F0003

研究了交换控制电路中基于输入缓冲的交换结构，提出了一种请求移位的方法处理输入缓冲和中央仲裁器之间的仲裁延时；输入缓冲交换结构的实现采用流水线方式，以减少数据发送请求和应答之间的延时。相似文献

8.

一种精确的支持多播的MIN性能分析模型

下载免费PDF全文

孙全宝张民选肖立权《计算机工程与科学》2008,30(8):4-7

使用分析模型对MIN的性能进行预测能够取得较好的精度,效率远高于模拟模型。本文深入分析了MIN交换单元输入端口状态之间的转换关系和相邻端口状态的相关性,对输入端口状态进行了重新定义,使用概率论和排队论的方法建立了一个精确的、支持多播的MIN性能分析模型。本文在端口状态转换中使用了条件概率,更加准确地反映了MIN执行行报文交换的实际操作过程。实验结果显示,使用本模型可以将延迟误差控制在10％以内,具有较高的精度。相似文献

9.

一种无权重的高性能CICQ结构调度算法

下载免费PDF全文

王晓亮杨君刚邱智亮李然《计算机工程》2006,32(15):123-125

在Crossbar交换单元的交叉节点加少量缓存的组合输入交叉节点排队(CICQ)结构，具有调度算法简单、性能优良、适于高速大容量路由器实现的特点。在总结现有研究成果的基础上，提出了一种避免仲裁指针同步的异步指针轮询算法。该算法将所有的输入、输出仲裁器的指针全部设置为异步，每个时隙静态地更新所有的仲裁器的指针，以达到网络指针去同步的目的。仿真结果表明，该算法在保持无权重算法简单性同时对不同业务流下的时延和吞吐率性能均有明显改善。相似文献

10.

基于流映射的负载均衡调度算法研究 总被引：1，自引：0，他引：1

戴艺苏金树孙志刚《计算机学报》2012,35(2):2218-2228

网络管理者需要能够提供可扩展性、吞吐率保证及报文顺序的高性能路由器体系结构.目前基于Crossbar的集中式路由器体系结构难以实现性能和规模的可扩展,基于两级Mesh网络的负载均衡交换结构成为扩展Internet路由器容量的有效的途径.负载均衡路由器存在严重的报文乱序现象,输出端报文重定序复杂度为O(N2).文中提出一种区域均等的负载均衡交换结构,每k个连续的中间级输入端口划分为一个区域,输入端采用基于流映射的负载分配算法UFFS-k(Uniform Fine-grain Frame Spreading,k为聚合粒度,简称UFFS-k),在k个连续的外部时间槽,以细粒度的方式将同一条流的k个信元分派到固定的映射区域,通过理论证明,该调度策略可获得100%吞吐率并能够保证报文的顺序.为避免流量区域集中现象,采用双循环(dual-rotation)方式构建不同输入端口的流到区域的映射关系;为实现负载在中间级输入端口的均衡分布,每个输入端口维护全局统一视图的流量分布矩阵,UFFS-k调度算法根据流量分布矩阵调度单位帧,可以证明,对任意输出端口j,同一区域OQj队列长度相同且不同区域OQj队列长度至多差1,从而实现了100%负载均衡度.UFFS-k调度算法分布于每个输入端口独立执行,根据流到区域的映射关系及负载分布状态分派信元,模拟结果显示,当聚合粒度k=2时,UFFS-k算法在同类维序算法中表现出最优延迟性能. 相似文献

11.

一种动态分配虚拟输出队列结构的片上路由器 总被引：1，自引：0，他引：1

朱红雷彭元喜尹亚明陈胜刚《计算机研究与发展》2012,49(1):183-192

传统虚通道流控技术的片上路由器通过增加虚通道缓解排头阻塞引起的链路吞吐率下降以及网络拥塞的同时,面临缓冲区低利用率、仲裁开销较大等问题.而动态虚通道流控的片上路由器虽可通过动态管理缓冲单元,提高缓冲区利用率与链路吞吐率,但却不可避免流控与仲裁逻辑复杂度与开销的快速增长.为了提高链路吞吐率与缓冲区利用率,获得较好的性能与开销折中,提出一种动态分配虚拟输出队列结构的片上路由器DAVOQ,该结构通过快速链表动态组织虚拟输出队列,同时使用超前路由机制以简化仲裁逻辑,优化流水线.模拟与综合的结果表明,相比传统虚通道路由器,DAVOQ路由器改善报文传输延迟与吞吐率的同时,在0.13μm CMOS工艺下,节省了15.1%的标准单元面积与12.9%的漏电流功耗;而相比动态虚通道路由器,DAVOQ路由器能够以较小的吞吐率损失获得可观的延迟改善,同时节约15.6%的标准单元面积与20.5%的漏电流功耗. 相似文献

12.

高阶路由器结构研究综述

杨文祥董德尊雷斐李存禄吴际孙凯旋《计算机工程与科学》2016,38(8):1517-1523

随着高性能网络规模的增加,高阶路由器结构设计成为高性能计算中研究的重点和热点。使用高阶路由器,网络能实现更低的报文传输延迟、网络构建成本和网络功耗,同时高阶路由器的应用还可以提高网络可靠性。过去十年是高阶路由器发展最快的时期,对近年高阶路由器的研究进行了综述,并对未来发展趋势进行了预测,主要介绍了以YARC为代表的经典结构化设计以及"network within a network"等近年来涌现的新型设计方法。未来的研究重点是解决高阶路由器结构设计中遇到的缓存和仲裁等各种问题,并利用光互连等技术设计性能更好的结构。相似文献

13.

Design and evaluation of a high throughput QoS-aware and congestion-aware router architecture for Network-on-Chip

Chifeng Wang Nader Bagherzadeh 《Microprocessors and Microsystems》2014

This paper proposes a novel QoS-aware and congestion-aware Network-on-Chip architecture that not only enables quality-oriented network transmission and maintains a feasible implementation cost but also well balance traffic load inside the network to enhance overall throughput. By differentiating application traffic into different service classes, bandwidth allocation is managed accordingly to fulfill QoS requirements. Incorporating with congestion control scheme which consists of dynamic arbitration and adaptive routing path selection, high priority traffic is directed to less congested areas and is given preference to available resources. Simulation results show that average latency of high priority and overall traffic is improved dramatically for various traffic patterns. Cost evaluation results also show that the proposed router architecture requires negligible cost overhead but provides better performance for both advanced mesh NoC platforms. 相似文献

14.

基于拥塞预测的NoC自适应仲裁方法*

杨盛光李丽徐懿张宇昂娄孝祥高明伦《计算机应用研究》2009,26(2):652-654

传统用于总线系统或互联网的仲裁方法已不能很好地适应NoC应用环境。围绕NoC系统性能的关键影响因素——拥塞状态,提出了一种基于全局和本地拥塞预测的仲裁策略(GLCA),以改善NoC网络延迟。实验结果表明,相对于RR方法,新仲裁算法使得网络平均包延迟和平均吞吐量最大分别可改善20.5%和8%,并且在不同负载条件下都保持了其优势。综合结果显示, GLCA与RR方法相比,路由器仅在组合逻辑上有少许增加(25.7%)。相似文献

15.

一种改进的主动队列管理算法

下载免费PDF全文

王新生袁小波《计算机工程》2011,37(10):79-80

从是否维护数据流状态信息的角度出发,提出一种改进的主动队列管理算法——SF-AQM。SF-AQM算法只维护发送速率大的数据流状态信息以降低路由器的开销,通过比较不同数据流的包到达时间间隔衡量流到达速率,识别出非适应性数据流,提高算法公平性,并使队列长度控制在目标值附近,保证算法稳定性。仿真结果表明,SF-AQM算法具有较好的公平性和稳定性,且对抑制网络拥塞有明显效果。相似文献

16.

Transaction-Aware Network-on-Chip Resource Reservation

Li Zheng Zhu Changyun Shang Li Dick Robert Sun Yihe 《Computer Architecture Letters》2008,7(2):53-56

Performance and scalability are critically-important for on-chip interconnect in many-core chip-multiprocessor systems. Packet-switched interconnect fabric, widely viewed as the de facto on-chip data communication backplane in the many-core era, offers high throughput and excellent scalability. However, these benefits come at the price of router latency due to run-time multi-hop data buffering and resource arbitration. The network accounts for a majority of on-chip data transaction latency. In this work, we propose dynamic in-network resource reservation techniques to optimize run-time on-chip data transactions. This idea is motivated by the need to preserve existing abstraction and general-purpose network performance while optimizing for frequently-occurring network events such as data transactions. Experimental studies using multithreaded benchmarks demonstrate that the proposed techniques can reduce on-chip data access latency by 28.4% on average in a 16-node system and 29.2% on average in a 36-node system. 相似文献

17.

A practical low-latency router architecture with wing channel for on-chip network

Mingche Lai Author VitaeLei GaoAuthor Vitae Sheng MaAuthor VitaeXiao NongAuthor Vitae Zhiying WangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):98-109

With increasing number of cores, the communication latency of Network-on-Chip becomes a dominant problem due to complex operations per node. In this paper, we try to reduce communication latency by proposing single-cycle router architecture with wing channel, which forwards the incoming packets to free ports immediately with the inspection of switch allocation results. Also, the incoming packets granted with wing channel can fill in the time-slots of crossbar switch and reduce the contentions with subsequent ones, thereby pushing throughput effectively. We design the proposed router using 65 nm CMOS process, and the results show that it supports different routing schemes and outperforms express virtual channel, prediction and Kumar’s single-cycle ones in terms of latency and throughput. When compared to the speculative router, it provides 45.7% latency reduction and 14.0% throughput improvement. Moreover, we show that the proposed design incurs a modest area overhead of 8.1% but the power consumption is saved by 7.8% due to less arbitration activities. 相似文献

18.

PowerTrust: A Robust and Scalable Reputation System for Trusted Peer-to-Peer Computing 总被引：12，自引：0，他引：12

Runfang Zhou Kai Hwang 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(4):460-473

We propose a new fair scheduling technique, called OCGRR (output controlled grant-based round robin), for the support of DiffServ traffic in a core router. We define a stream to be the same-class packets from a given immediate upstream router destined to an output port of the core router. At each output port, streams may be isolated in separate buffers before being scheduled in a frame. The sequence of traffic transmission in a frame starts from higher-priority traffic and goes down to lower-priority traffic. A frame may have a number of small rounds for each class. Each stream within a class can transmit a number of packets in the frame based on its available grant, but only one packet per small round, thus reducing the intertransmission time from the same stream and achieving a smaller jitter and startup latency. The grant can be adjusted in a way to prevent the starvation of lower priority classes. We also verify and demonstrate the good performance of our scheduler by simulation and comparison with other algorithms in terms of queuing delay, jitter, and start-up latency 相似文献

19.

OCGRR: A New Scheduling Algorithm for Differentiated Services Networks 总被引：1，自引：0，他引：1

Rahbar Akbar Ghaffar Pour Yang Oliver 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(5):697-710

We propose a new fair scheduling technique, called OCGRR (Output Controlled Grant-based Round Robin), for the support of DiffServ traffic in a core router. We define a stream to be the same-class packets from a given immediate upstream router destined to an output port of the core router. At each output port, streams may be isolated in separate buffers before being scheduled in a frame. The sequence of traffic transmission in a frame starts from higher-priority traffic and goes down to lower-priority traffic. A frame may have a number of small rounds for each class. Each stream within a class can transmit a number of packets in the frame based on its available grant, but only one packet per small round, thus reducing the intertransmission time from the same stream and achieving a smaller jitter and startup latency. The grant can be adjusted in a way to prevent the starvation of lower priority classes. We also verify and demonstrate the good performance of our scheduler by simulation and comparison with other algorithms in terms of queuing delay, jitter, and start-up latency. 相似文献