期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hybrid Multi-FPGA Board Evaluation by Permitting Limited Multi-Hop Routing

Sushil Chandra Jain Anshul Kumar Shashi Kumar 《Design Automation for Embedded Systems》2003,8(4):309-326

Multi-FPGA Boards (MFBs) have been in use for more than a decade for implementing systems requiring high performance and for emulation/prototyping of multimillion gate chips. It is important to develop an MFB architecture which can be used for emulation or prototyping of a large number of circuits. A key feature of an MFB is its routing architecture defined by its inter-Field-Programmable Gate Array (FPGA) connections. There are two types of inter-FPGA connections, namely–fixed connections (FCs) connecting a pair of FPGAs through dedicated wires and programmable connections (PCs) which connect a pair of FPGAs through a programmable switch. An architecture which has a mix of both these type of connections is called a hybrid routing architecture. It has been shown in the literature [7] that a hybrid MFB architecture is more efficient for emulation than an architecture with only one type of connections. The cost of an MFB and delay of the emulated circuit on it depends on the number of PCs used for emulation. An objective of a designer of an MFB for circuit emulation is to minimize the required number of PCs. In this paper, we describe algorithms to evaluate the requirement of PCs for many hybrid routing architectures.The requirement of PCs can be reduced if some programmable connections are replaced by a connection using only FCs by routing through FPGAs. Such a routing is called multi-hop routing. We present an optimal and a heuristic algorithm for estimation of PCs when limited number of hops through FPGAs are permitted. The unique feature of our evaluation scheme is that it is generic and treat routing architecture as a parameter. We have used benchmark circuits as well as synthetic cloned circuits for testing our algorithms. Our heuristic algorithm is very fast and gives optimal results most of the time. Our algorithms can be used for actual routing during circuit emulation. 相似文献

2.

VPR在FPGA结构设计中的应用 总被引：1，自引：0，他引：1

下载免费PDF全文

李兴政杨海钢钟华《电子器件》2007,30(5):1874-1877

现场可编程门阵列(FPGA)是一种应用非常广泛,同时结构性很强的电子器件.它是由一些相同的基本电路单元依据一定的规则排列而成,其性能在很大程度上取决于一些关键性结构参数的设置.通过在一典型FPGA芯片中对不同的逻辑电路进行布局布线,得到相应的面积和延时等信息,由此研究、分析FPGA的结构参数与芯片性能之间的关系,并在实验基础上得出了部分结构参数的优化取值范围. 相似文献

3.

An efficient logic emulation system

Varghese J. Butts M. Batcheller J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(2):171-174

The Realizer, is a logic emulation system that automatically configures a network of field-programmable gate arrays (FPGAs) to implement large digital logic designs, is presented. Logic and interconnect are separated to achieve optimum FPGA utilization. Its interconnection architecture, called the partial crossbar, greatly reduces system-level placement and routing complexity, achieves bounded interconnect delay, scales linearly with pin count, and allows hierarchical expansion to systems with hundreds of thousands of FPGA devices in a fast and uniform way. An actual multiboard system has been built, using 42 Xilinx XC3090 FPGAs for logic. Several designs, including a 32-b CPU datapath, have been automatically realized and operated at speed. They demonstrate very good FPGA utilization. The Realizer has applications in logic verification and prototyping, simulation, architecture development, and special-purpose execution 相似文献

4.

The Triptych FPGA architecture

Borriello G. Ebeling C. Hauck S.A. Burns S. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1995,3(4):491-501

Field-programmable gate arrays (FPGAs) are an important implementation medium for digital logic. Unfortunately, they currently suffer from poor silicon area utilization due to routing constraints. In this paper we present Triptych, an FPGA architecture designed to achieve improved logic density with competitive performance. This is done by allowing a per-mapping tradeoff between logic and routing resources, and with a routing scheme designed to match the structure of typical circuits. We show that, using manual placement, this architecture yields a logic density improvement of up to a factor of 3.5 over commercial FPGAs, with comparable performance. We also describe Montage, the first FPGA architecture to fully support asynchronous and synchronous interface circuits 相似文献

5.

PITIA: an FPGA for throughput-intensive applications

Singh A. Mukherjee A. Macchiarulo L. Marek-Sadowska M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(3):354-363

In this paper, we present a novel, high throughput field-programmable gate array (FPGA) architecture, PITIA, which combines the high-performance of application specific integrated circuits (ASICs) and the flexibility afforded by the reconfigurability of FPGAs. The new architecture, which targets datapath circuits, uses the concepts of wave steering and pipelined interconnects. We discuss the FPGA architecture and show results for performance, power consumption, clock network performance, and routability. Results for some commonly used datapath designs are encouraging with throughputs in the neighborhood of 625MHz in 0.25-/spl mu/m 2.5-V CMOS technology. Results for random benchmark circuits are also shown. We characterize designs according to their Rent's exponents and argue that designs with predominantly local interconnects are the best fit in PITIA. We also show that as technology scales down toward deep submicron, PITIA shows an increasing throughput performance. 相似文献

6.

Application of nanojunction-based RRAM to reconfigurable IC

Liu M. Wang W. 《Micro & Nano Letters, IET》2008,3(3):101-105

A novel reconfigurable architecture, rFPGA, is developed by utilising high-density resistive memory (RRAM) circuits as FPGA components. Different from the existing CMOS-nano hybrid FPGAs that use crossbars, the rFPGA mainly consists of 1T1R RRAM structures (one CMOS transistor is integrated with a two-terminal resistive nanojunction) that can be fabricated using an efficient CMOS-compatible process. These 1T1R structures can significantly improve the FPGA memory and routing circuits, and enable the rFPGA to achieve at least a 2x density enhancement along with a 10% reduction of delay and power, compared with the corresponding CMOS FPGA. 相似文献

7.

The design of a SRAM-based field-programmable gate array-Part II:Circuit design and layout

Chow P. Soon Ong Seo Rose J. Chung K. Paez-Monzon G. Rahardja I. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(3):321-330

相似文献

8.

The design of an SRAM-based field-programmable gate array. I.Architecture

Chow P. Soon Ong Seo Rose J. Chung K. Paez-Monzon G. Rahardja I. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(2):191-197

相似文献

9.

基于数据通路结构的FPGA编译系统策略

费小泂郭斌林童家榕《微电子学》2001,31(5):347-350

在基于FPGA的电路设计流程中,对电路规整性的利用将层致系统性能和布图效率的提高。针对现有的FPGA设计软件对电路属性,尤其是规整性和层次性,考虑不够,导致在实现数据通路（datapath)电路时性能欠佳的事实,文章提出了一种适合具有大量规整单元的电路的FPGA编译系统构架架。此CAD系统结构将充分考虑具有规整结构的电路单元的特殊性,从编译系统的输入部分入手,尽可能区分并区别对待普通逻辑与规整单元,以便优化规单元,以至整个电路系统的性能,最后,利用模拟退火的布图规划策略完成布图的迭代优化。相似文献

10.

The effect of logic block architecture on FPGA performance

Singh S. Rose J. Chow P. Lewis D. 《Solid-State Circuits, IEEE Journal of》1992,27(3):281-287

This authors explore the effect of logic block architecture on the speed of a field-programmable gate array (FPGA). Four classes of logic block architecture are investigated: NAND gates, multiplexer configurations, lookup tables, and wide-input AND-OR gates. An experimental approach is taken, in which each of a set of benchmark logic circuits is synthesized into FPGAs that use different logic blocks. The speed of the resulting FPGA implementations using each logic block is measured. While the results depend on the delay of the programmable routing, experiments indicate that five- and six-input lookup tables and certain multiplexer configurations produce the lowest total delay over realistic values of routing delay. The fine grain blocks, such as the two-input NAND gate, exhibit poor performance because these gates require many levels of logic block to implement the circuits and hence require a large routing delay 相似文献

11.

FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Martin Thuresson Magnus Själander Magnus Björk Lars Svensson Per Larsson-Edefors Per Stenstrom 《Journal of Signal Processing Systems》2009,57(1):5-19

We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the FlexCore allows specialized datapath units to be inserted and utilized within the same architecture and compilation framework. This study shows that, in comparison to a conventional five-stage general-purpose processor, the FlexCore is up to 40% more efficient in terms of cycle count on a set of benchmarks from the embedded application domain. We show that both the fine-grained control and the flexible interconnect contribute to the speedup. Furthermore, according to our VLSI implementation study, the FlexCore architecture offers both time and energy savings. The exposed FlexCore datapath requires a wide control word. The conducted evaluation confirms that this increases the instruction bandwidth and memory footprint. This calls for efficient instruction decoding as proposed in the FlexSoC framework. 相似文献

12.

A routing algorithm for FPGAs with time-multiplexed interconnects

Ruiqi Luo Xiaolei Chen Yajun Ha 《半导体学报》2020,(2):73-82

Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs.In this work,we propose a time-multiplexing technique on FPGA interconnects.In order to fully exploit this interconnect architecture,we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires.We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs.We achieve a 38%smaller minimum channel width and 3.8%smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle. 相似文献

13.

Design and analysis of a dynamically reconfigurablethree-dimensional FPGA

Chiricescu S. Leeser M. Vai M.M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):186-196

This paper presents the design and analysis of a dynamically reconfigurable field programmable gate array (FPGA) that consists of three physical layers: routing and logic block layer, routing layer, and memory layer. The architecture was developed using a methodology that examines different architectural parameters and how they affect different performance criteria such as speed, area, and reconfiguration time. The resulting architecture has high performance while the requirement of balancing the areas of its constituent layers is satisfied 相似文献

14.

SAT based solutions for detailed routing of island style FPGA architectures

《Microelectronics Journal》2015,46(8):706-715

Detailed routing solutions for island style FPGA architectures using Boolean satisfiability (SAT) based formulations have been proposed in this paper. Due to decreasing size of ICs and hence, the increasing complexity of the routing resource constraints, routing has been a big challenge in electronic design automation field. Our proposed techniques work on multi-pin net routing where all nets are considered for routing in their intact form whereas, most of the existing routing solutions decompose multi-pin nets into two-pin nets for detailed routing to ease the problem. However this approach, apart from increasing the number of nets in the circuits, may also introduce pin doglegging which, when not permitted by the architecture of FPGA, would require extra constraints to eliminate. Many detailed routers adopt sequential detailed routing approaches which are vulnerable to the net ordering problem which may cause a routable circuit to be erroneously classified as unroutable. Our proposed techniques avoid these pitfalls by keeping the multi-pin nets intact and solve all nets simultaneously using SAT. The SAT-based multi-pin net dogleg-free formulations presented here achieve significant improvement over existing SAT-based solutions with respect to the number of variables and clauses used, thereby achieving greater scalability and also display comparable and sometimes better routability results on benchmark circuits when compared with other detailed routing solutions. Detailed routing is also significantly affected by the architecture of the switching blocks. This paper proposes SAT-based formulation for three different switch box architectures i.e. Subset, Wilton, and Universal switches. Our experiments clearly demonstrate how routing solutions for a circuit can differ significantly for different types of switch boxes. 相似文献

15.

Rapid Synthesis and Simulation of Computational Circuits in an MPPA

David Grant Graeme Smecher Guy G. F. Lemieux Rosemary Francis 《Journal of Signal Processing Systems》2012,67(1):47-63

A computational circuit is custom-designed hardware which promises to offer maximum speedup of computationally intensive software algorithms. However, the practical needs to manage development cost and many low-level physical design details erodes much of the potential speedup by distracting attention away from high-level architectural design. Instead, designers need an inexpensive, processor-like platform where computational circuits can be rapidly synthesized and simulated. This enables rapid architectural evolution and mitigates the risk of producing custom hardware. In this paper we present a tool flow (RVETool) for compiling computational circuits into a massively parallel processor array (MPPA). We demonstrate the CAD runtime is on average 70× faster than FPGA tools, with a circuit speed 5.8× slower than FPGA devices. Unlike the fixed logic capacity of FPGAs, RVETool can trade area for simulation performance by targeting a wide range in the number of processor cores. We also demonstrate tool scalability to very large circuits, synthesizing, placing, and routing a ≈1.6 million gate random circuit in 54 min. 相似文献

16.

A novel and efficient routing architecture for multi-FPGA systems

Khalid M.A.S. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):30-39

Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and field-programmable interconnect devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed, and previous research has shown that the partial crossbar is one of the best existing architectures. In this paper, we propose a new routing architecture, called the hybrid complete-graph and partial-crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hard-wired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and interchip routing tools were developed, with particular attention paid to architecture-appropriate interchip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture-the proportion of hard-wired connections versus programmable connections-to determine its best value 相似文献

17.

Routability of Network Topologies in FPGAs

Saldana M. Shannon L. Jia Shuo Yue Sikang Bian Craig J. Chow P. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(8):948-951

A fundamental difference between application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) is that the wires in ASICs are designed to match the requirements of a particular design. Conversely, in an FPGA, the area is fixed and the routing resources exist whether or not they are used. In this paper, we investigate how well several common network topologies map onto a modern FPGA routing fabric. Different multiprocessor network topologies with between 8 and 64 nodes are mapped to a single large FPGA. Except for the fully-connected networks, it is observed that the difference in logic resources used and routing overhead among these topologies is insignificant for the systems tested. Fully-connected networks up to about 22 nodes are also feasible on the same FPGA although the logic and routing utilization clearly grows much faster. The conclusion is that a modern FPGA fabric is very rich in resources and capable of supporting highly interconnected topologies. For systems with a modest number of nodes implemented on current large FPGAs, it is not necessary to use the connectivity-limited topologies typically used for networks-on-chip. Rather, direct point-to-point connections between all communicating nodes can be considered. 相似文献

18.

一种基于AND-LUT的混合FPGA结构

陈利光来金梅童家榕《半导体学报》2007,28(3)

提出了一种混合FPGA新结构--新颖的AND-LUT阵列结构.其创新之处在于由可编程逻辑簇(Cluster)和相关的连接盒(CB)组成的可编程逻辑单元片(Tile)可以根据应用需要灵活地配置成PLA或LUT,前者较适合于高扇入逻辑,后者较适合于低扇入逻辑.因此,结合两者优点的新颖AND-LUT阵列结构在实现各种输入的用户逻辑时都能保持很好的逻辑利用率.MCNC电路测试结果进一步表明,同一逻辑电路在文中提出的混合FPGA新结构中实现与在基于LUT的对称FPGA结构中实现相比,面积平均可节省46%,因而大大提高了FPGA器件的逻辑利用率. 相似文献

19.

一种基于AND-LUT的混合FPGA结构

陈利光来金梅童家榕《半导体学报》2007,28(3):398-403

提出了一种混合FPGA新结构--新颖的AND-LUT阵列结构.其创新之处在于由可编程逻辑簇(Cluster)和相关的连接盒(CB)组成的可编程逻辑单元片(Tile)可以根据应用需要灵活地配置成PLA或LUT,前者较适合于高扇入逻辑,后者较适合于低扇入逻辑.因此,结合两者优点的新颖AND-LUT阵列结构在实现各种输入的用户逻辑时都能保持很好的逻辑利用率.MCNC电路测试结果进一步表明,同一逻辑电路在文中提出的混合FPGA新结构中实现与在基于LUT的对称FPGA结构中实现相比,面积平均可节省46%,因而大大提高了FPGA器件的逻辑利用率. 相似文献

20.

FPGA布线通道分布对面积效率的影响研究 总被引：2，自引：0，他引：2

徐新民王倩严晓浪《电子与信息学报》2006,28(10):1959-1962

该文提出了现场可编程门阵列(FPGA)布线通道不均匀分布对芯片面积的影响。引入几个典型的数学分布函数(高斯,正弦和三角分布),实现通道容量随函数分布变化的新FPGA结构。将这些结构的FPGA与传统的布线通道均匀分布的FPGA作比较,结果表明按照数学分布变化的布线通道分布结构比均匀分布情况下的面积效率要高。亦即通道分布的变化趋势是峰值位置位于芯片中央,即通道容量最大,从中间位置向边缘按函数变化趋势逐渐变小。相似文献