首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
在片上网络(NoC)设计中,片上路由器的优劣将直接影响整个NoC系统的性能.文中综合考虑了包交换方式和电路交换方式的优缺点,采用"包-电路交换"方式,设计并实现了一种低硬件资源消耗、高性能的"包-电路交换"片上路由器.  相似文献   

2.
Network‐on‐chip (NoC) architecture provides a high‐performance communication infrastructure for system‐on‐chip designs. Circuit‐switched networks guarantee transmission latency and throughput; hence, they are suitable for NoC architecture with real‐time traffic. In this paper, we propose an efficient integrated scheme which automatically maps application tasks onto NoC tiles, establishes communication circuits, and allocates a proper bandwidth for each circuit. Simulation results show that the average waiting times of packets in a switch in 6×6, 8×8, and 10×10 mesh NoC networks are 0.59, 0.62, and 0.61, respectively. The latency of circuits is significantly decreased. Furthermore, the buffer of a switch in NoC only needs to accommodate the data of one time slot. The cost of the switch in the circuit‐switched network can be reduced using our scheme. Our design provides an effective solution for a critical step in NoC design.  相似文献   

3.
周小锋  朱樟明  周端 《半导体学报》2016,37(7):075002-8
The bufferless router emerges as an interesting option for cost-efficient in network-on-chip (NoC) design. However, the bufferless router only works well under low network load because deflection more easily occurs as the injection rate increases. In this paper, we propose a load balancing bufferless deflection router (LBBDR) for NoC that relieves the effect of deflection in bufferless NoC. The proposed LBBDR employs a balance toggle identifier in the source router to control the initial routing direction of X or Y for a flit in the network. Based on this mechanism, the flit is routed according to XY or YX routing in the network afterward. When two or more flits contend the same one desired output port a priority policy called nearer-first is used to address output ports allocation contention. Simulation results show that the proposed LBBDR yields an improvement of routing performance over the reported bufferless routing in the flit deflection rate, average packet latency and throughput by up to 13%, 10% and 6% respectively. The layout area and power consumption compared with the reported schemes are 12% and 7% less respectively.  相似文献   

4.
Network-on-chip (NoC) has rapidly become a promising alternative for complex system-on-chip architectures including recent multicore architectures. Additionally, optimizing NoC architectures with respect to different design objectives that are suitable for a particular application domain is crucial for achieving high-performance and energy-efficient customized solutions. Despite the fact that many researches have provided various solutions for different aspects of NoCs design, a comprehensive NoCs system solution has not emerged yet. This paper presents a novel methodology to provide a solution for complex on-chip communication problems to reduce power, latency and area overhead. Our proposed NoC communication architecture is based on setting up virtual source–destination paths between selected pairs of NoCs cores so that the packets belonging to distance nodes in the network can bypass intermediate routers while traveling through these virtual paths. In this scheme, the paths are constructed for an application based on its task-graph at the design time. After that, the run time scheduling mechanism is applied to improve the buffer management, virtual channel and switch allocation schemes and hence, the constructed paths are optimized dynamically. Moreover, in our design the router complexity and its overheads are reduced. Additionally, the suggested router has been implemented on Xilinx Virtex-5 FPGA family. The evaluation results captured by SPLASH-2 benchmark suite reveal that in comparison with the conventional NoC router, the proposed router takes 25% and 53% reduction in latency and energy, respectively besides 3.5% area overhead. Indeed, our experimental results demonstrate a significant reduction in the average packet latency and total power consumption with negligible area overhead.  相似文献   

5.
3-D Topologies for Networks-on-Chip   总被引:2,自引:0,他引:2  
Several interesting topologies emerge by incorporating the third dimension in networks-on-chip (NoC). The speed and power consumption of 3D NoC are compared to that of 2D NoC. Physical constraints, such as the maximum number of planes that can be vertically stacked and the asymmetry between the horizontal and vertical communication channels of the network, are included in speed and power consumption models of these novel 3D structures. An analytic model for the zero-load latency of each network that considers the effects of the topology on the performance of a 3D NoC is developed. Tradeoffs between the number of nodes utilized in the third dimension, which reduces the average number of hops traversed by a packet, and the number of physical planes used to integrate the functional blocks of the network, which decreases the length of the communication channel, is evaluated for both the latency and power consumption of a network. A performance improvement of 40% and 36% and a decrease of 62% and 58% in power consumption is demonstrated for 3D NoC as compared to a traditional 2D NoC topology for a network size of N = 128 and N = 256 nodes, respectively.  相似文献   

6.
The issues of applying the code-division multiple access (CDMA) technique to an on-chip packet switched communication network are discussed in this paper. A packet switched network-on-chip (NoC) that applies the CDMA technique is realized in register-transfer level (RTL) using VHDL. The realized CDMA NoC supports the globally-asynchronous locally-synchronous (GALS) communication scheme by applying both synchronous and asynchronous designs. In a packet switched NoC, which applies a point-to-point connection scheme, e.g., a ring topology NoC, data transfer latency varies largely if the packets are transferred to different destinations or to the same destination through different routes in the network. The CDMA NoC can eliminate the data transfer latency variations by sharing the data communication media among multiple users concurrently. A six-node GALS CDMA on-chip network is modeled and simulated. The characteristics of the CDMA NoC are examined by comparing them with the characteristics of an on-chip bidirectional ring topology network. The simulation results reveal that the data transfer latency in the CDMA NoC is a constant value for a certain length of packet and is equivalent to the best case data transfer latency in the bidirectional ring network when data path width is set to 32 bits.  相似文献   

7.
周小锋  刘露  朱樟明  周端 《半导体学报》2016,37(11):115003-7
The design of a router in a network-on-chip (NoC) system has an important impact on some performance criteria. In this paper, we propose a low overhead load balancing router (LOLBR) for 2D mesh NoC to enhance routing performance criteria with low hardware overhead. The proposed LOLBR employs a balance toggle identifier to control the initial routing direction of X or Y for flit injection. The simplified demultiplexers and multiplexers are used to handle output ports allocation and contention, which provide a guarantee of deadlock avoidance. Simulation results show that the proposed LOLBR yields an improvement of routing performance over the reported routing schemes in average packet latency by 26.5%. The layout area and power consumption of the network compared with the reported routing schemes are 15.3% and 11.6% less respectively.  相似文献   

8.
We present a novel Partial Virtual channel Sharing (PVS) NoC architecture which reduces the impact of faults on performance and also tolerates faults within the routing logic. Without PVS, failure of a component impairs the fault-free connected components, which leads to considerable performance degradation. Improving resource utilization is key in enhancing or sustaining performance with minimal overhead when faults or overload occurs. In the proposed architecture, autonomic virtual-channel buffer sharing is implemented with a novel algorithm that determines the sharing of buffers among a set of ports. The runtime allocation of the buffers depends on incoming load and fault occurrence. In addition, we propose an efficient technique for maintaining the accessibility of a processing element (PE) to the network even if its router is faulty. Our techniques can be used in any NoC topology and for both, 2D and 3D NoCs. The synthesis results for an integrated video conference application demonstrate 22 % reduction in average packet latency compared to state-of-the-art virtual channel (VC) based NoC architecture. Extensive quantitative simulation has been carried out with synthetic benchmarks. Simulation results reveal that the PVS architecture improves the performance significantly in presence of faults, compared to other VC-based NoC architectures.  相似文献   

9.
In complex embedded applications, optimisation and adaptation of both dynamic and leakage power have become an issue at SoC grain. A fully power-aware globally-asynchronous locally-synchronous network-on-chip (NoC) circuit is presented in this paper. Network-on-chip architecture combined with a globally-asynchronous locally-synchronous paradigm is a natural enabler for DVFS mechanisms. The circuit is arranged around an asynchronous network-on-chip providing scalable communication and a 17 Gb/s throughput while automatically reducing its power consumption by activity detection. Both dynamic and static power consumptions are globally reduced using adaptive design techniques applied locally for each synchronous NoC units. No fine control software is required during voltage and frequency scaling. Power control is localized and a minimal latency cost is observed.   相似文献   

10.
The current network-on-chip (NoC) topology cannot predict subsequent switch node status promptly. Switch nodes have to perform various functions such as routing decision, data forwarding, packet buffering, congestion control and properties of an NoC system. Therefore, these make switch architecture far more complex. This article puts forward a separating on-chip network architecture based on Mesh (S-Mesh). S-Mesh is an on-chip network that separates routing decision flow from the switches. It consists of two types of networks: datapath network (DN) and control network (CN). The CN establishes data paths for data transferring in DN. Meanwhile, the CN also transfers instructions between different resources. This property makes switch architecture simple, and eliminates conflicts in network interface units between the resource and switch. Compared with 2D-Mesh, Torus Mesh, Fat-tree and Butterfly, the average packet latency in S-Mesh is the shortest when the packet length is more than 53 B. Compared with 2D-Mesh, the areas savings of S-Mesh is about 3%--7%, and the power dissipation is decreased by approximate 2%.  相似文献   

11.
Due to the globalized semiconductor business model, malicious hardware modifications, known as hardware Trojans (HTs), have risen up as a big concern for chip security. HT detection and mitigation methods for general integrated circuits have been investigated in the past decade. However, the majority of the existing efforts are not customized for HTs in Networks-on-Chip (NoCs). To complement the firmware and software level methods for rogue NoCs detection, we propose countermeasures to harden the NoC hardware design against tampering. More specifically, we propose a collaborative dynamic permutation and flit integrity check method to mitigate the potential inside-router HTs inserted by the disloyal member in the NoC design house or the 3rd-party system integration company. Our method improves the number of received packets by up to 70.1% over the other methods if the HT controls the NoC packet destination address. The average link availability of our method is 43.7% higher than that of the exiting methods. Our method increases the effective average latency by up to 63.4%, 68.2%, and 98.9% for the single HT in the destination, header, and tail fields, respectively, over the existing methods.  相似文献   

12.
Address driver circuits are responsible for a significant portion of power consumption by a plasma display panel television. In the latest effort for the reduction of the power consumption by address driver circuits, the power-efficient operation mode is selected by controlling the voltage of the capacitor that affects the $LC$ resonance. This paper presents an analysis of power consumption by the address circuits and observes that the operation mode change in a previous work is too slow to achieve an efficient power reduction when the number of data switchings of each scan line changes abruptly in the input image. This paper proposes a new power reduction method with a fast selection of the operation mode based on the analysis of the power consumption by each address operation mode. The fast mode selection enables rapid change of the address operation mode and consequently leads to a reduction of power consumption by 22.2% when compared with the previous method.   相似文献   

13.
Network‐on‐chip (NoC) is an emerging design paradigm intended to cope with future systems‐on‐chips (SoCs) containing numerous built‐in cores. Since NoCs have some outstanding features regarding design complexity, timing, scalability, power dissipation and so on, widespread interest in this novel paradigm is likely to grow. The test strategy is a significant factor in the practicality and feasibility of NoC‐based SoCs. Among the existing test issues for NoC‐based SoCs, test access mechanism architecture and test scheduling particularly dominate the overall test performance. In this paper, we propose an efficient NoC‐based SoC test scheduling algorithm based on a rectangle packing approach used for current SoC tests. In order to adopt the rectangle packing solution, we designed specific methods and configurations for testing NoC‐based SoCs, such as test packet routing, test pattern generation, and absorption. Furthermore, we extended and improved the proposed algorithm using multiple test clocks. Experimental results using some ITC’02 benchmark circuits show that the proposed algorithm can reduce the overall test time by up to 55%, and 20% on average compared with previous works. In addition, the computation time of the algorithm is less than one second in most cases. Consequently, we expect the proposed scheduling algorithm to be a promising and competitive method for testing NoC‐based SoCs.  相似文献   

14.
《Microelectronics Journal》2015,46(11):1002-1011
In the many-core systems, network-on-chip (NoC) serves as an efficient and scalable architecture to connect numerous on-chip resources, whereas it encounters the crisis of the increasing leakage power as technology is continually scaling down. Power-gating which is a representative low-power technique can be utilized to mitigate the increasing leakage power, but the disconnection problem suffered in the conventional power-gated NoC may severely affect network performance. In this paper, we propose a novel partial power-gating approach to avoid the performance loss caused by the disconnection. Firstly, we utilize the asymmetrical bit-slicing scheme to split router into two slices. After the bit-slicing of router datapath, the wide slices can be switched off to save some leakage power by using partial power-gating, but all narrow slices should be kept in ever-active state to avoid the disconnection. Next, owing to the slicing of router datapath, we redefine the packet format for the packet׳s slicing and transferring, and present two essential conversion modules to achieve packet׳s slicing and reassembling. In the synthetic traffic simulation, our design gains considerable power-saving at low-load and exhibits better performance behavior than the conventional power-gated design. The application simulation shows that our design can averagely save 27.5% of total power compared with the baseline design, and reduce 45.0% packet latency on average when compared with the conventional power-gated design. On balance, the bit-sliced NoC with partial power-gating has a better tradeoff between performance and power-efficiency.  相似文献   

15.
Network-on-Chip (NoC) has been recognized as the new paradigm to interconnect and organize a high number of cores. NoCs address global communication issues in System-on-Chips (SoC) involving communication-centric design and implementation of scalable communication structures evolving application-specific NoC design as a key challenge to modern SoC design. In this paper we present a SystemC customization framework and methodology for automatic design and evaluation of regular and irregular NoC architectures. The presented framework also supports application-specific optimization techniques such as priority assignment, node clustering and buffer sizing. Experimental results show that generated regular NoC architectures achieve an average of 5.5 % lower communication-cost compared to other regular NoC designs while irregular NoCs proved to achieve on average 4.5×higher throughput and 40 % network delay reduction compared to regular mesh topologies. In addition, employing a buffer sizing algorithm we achieve a reduction in network’s power consumption by an average of 45 % for both regular and irregular NoC design flow.  相似文献   

16.
《Microelectronics Journal》2002,33(5-6):403-407
Two adiabatic circuits with complementary structure and operation are proposed in this paper. They employ two-phase sinusoidal power clock. The power consumption of the proposed circuits is comparable to that of some previously reported circuits. The problem of floating output nodes is solved by connecting two MOS transistors to the power clock. In particular, using the proposed architecture more than one stage of gates can be computed simultaneously within a single clock-phase, compared to only one stage is computed in every phase by most other adiabatic logic families. With this feature, the latency of the complex logic circuit is greatly improved and the number of buffers required for a pipelining circuit is also reduced. In this paper, a 2:1 multiplexer and full adder are illustrated and simulated. From the PSPICE simulation results, the effectiveness of the proposed approach and the low power characteristic of the designed circuits are validated.  相似文献   

17.
文中针对NoC体系结构,提出了两种数据压缩技术,被称为高速缓存压缩和网络接口控制(NIC)内的压缩.性能实测结果指示压缩能够使NoC设计在较低的网络延迟、较低的功耗和改进应用性能等方面获得优势.  相似文献   

18.
The programmable logic array (PLA) is a basic and important building circuit for VLSI chips. Operating behaviors of several conventional PLAs are analyzed first to find out their speed and power bottlenecks. Then, new circuit design techniques for the CMOS PLA are proposed in the hopes of fulfilling the requirements of high speed and low power at the same time. Finally, high speed is achieved through the combined effect of utilization of a fast pseudofootless dynamic circuit and a reduced interplane clock delay. On the other hand, low power is achieved because the power consumption from the three main sources, i.e., the AND-plane circuits, the interplane buffers, and the OR-plane circuits, can be reduced significantly and simultaneously. The delay time and the power consumption of the critical path of a PLA are taken as the performance evaluation parameters. When the 50×50×64 PLAs are designed in a 0.35-μm 1P4M CMOS technology, the maximum operating frequency of the proposed PLA is 1.61 times higher than that of the fastest conventional PLA. Furthermore, power reduction can be as high as 18% and 43% when the operating frequencies are set to be 100 MHz and 50 MHz, respectively, as compared to the most power-efficient conventional PLA  相似文献   

19.
Internet routers conduct routing table (RT) lookup based on the destination IP address of the incoming packet to decide which output port to forward the packet. Ternary content-addressible memories (TCAM) uses parallelism to achieve lookup in a single cycle. One of the major drawbacks of TCAM is its high-power consumption. Trie-based architecture has been proposed to reduce TCAM power consumption. The idea is to use an index TCAM to select one of many data TCAM blocks for lookup. However, power reduction is limited by the size of the index TCAM, which is always enabled for search. In this paper we develop a simple but effective trie-partitioning algorithm to reduce the index TCAM size, which achieves better reduction in power consumption, and at the same time guarantees full TCAM space utilization. We compared our algorithm (LogSplit) with PostOrderSplit (IEEE INFOCOM, 2003). For two real-world RTs (AADS and PAIX), the size of the index TCAM generated by LogSplit is 55–70% of that generated by PostOrderSplit; the largest power reduction factor of LogSplit is 41 for AADS and 68 for PAIX, while the largest power reduction factor of PostOrderSplit is 33 for AADS and 52 for PAIX. The improvement is even more significant in the worst case: the size of the index TCAM generated by LogSplit is 18–30% of that generated by PostOrderSplit for IPv4, and less than 1% of that generated by PostOrderSplit for IPv6; the largest power reduction factor of LogSplit is 173 for both IPv4 and IPv6, while the largest power reduction factor of PostOrderSplit is only 82 for IPv4 and 41 for IPv6. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

20.
基于提前分配路径的低时延片上路由器结构   总被引:1,自引:0,他引:1  
该文针对片上网络提出一种基于提前分配路径的低时延片上路由器结构(PAPR).新路由器采用提前路由计算和提前分配路径来缩短路由器流水线深度.提前路由计算为虚信道提前分配提供了可靠保障,即使在虚信道路径提前分配失败的情况下,也不影响分组在网络中的传输时延.该文提出基于缓存状态的仲裁算法BSTS(Buffer Status)综合考虑当前节点缓存信息和下游节点缓存信息,不但降低了分组等待时延,而且降低了缓存空闲的概率.仿真结果表明,新路由器能明显改善网络的时延和吞吐性能,相比采用滑动迭代轮询仲裁iSLIP(iterative Round-Robin Matching with SLIP(Serial Line Interface Protocal))算法的经典虚信道路由器,网络平均端到端时延降低了24.5%,吞吐率提高了27.5%;与采用轮询迭代RRM(Round-Robin Matching)算法的经典虚信道路由器相比,平均端到端时延降低了39.2%,吞吐率提高了47.2%.路由器硬件开销和平均功耗分别增加仅为8.9%,5.9%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号