期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

韩小炜吴利华赵岩李艳张倩莉陈亮张国权李建忠杨波高见头王剑李明刘贵宅张峰郭旭峰陈陵都刘忠立于芳赵凯《半导体学报》2011,32(7):075012-6

介绍抗辐射VS1000 FPGA芯片架构及其设计实现。改进的基于3输入查找表的多模式逻辑单元,与传统的基于4输入查找表相比,可以提高约12%的逻辑利用率。逻辑模块由两个逻辑单元组成,可以被配置成两种工作模式：LUT模式和分布式RAM模式。新颖的层次化布线通道模块和开关模块可以极大的提高布线资源的布通率。VS1000芯片包括392个可编程逻辑单元,112个用户IO以及与IEEE 1149.1兼容的边界扫描逻辑,采用0.5 um部分耗尽绝缘体上硅CMOS工艺全定制设计并流片。功能测试结果表明, 芯片软硬件能够成功配合且实现用户特定功能。抗辐照实验结果表明,抗总剂量水平超过100Krad(Si), 抗瞬态剂量率水平超过1.510¹¹rad(Si)/s,抗中子注入量水平达到110¹⁴ n/cm²。相似文献

2.

FDP FPGA芯片的设计实现

陈利光王亚斌吴芳来金梅童家榕张火文屠睿王建王元申秋实余慧黄均鼐卢海舟潘光华《半导体学报》2008,29(4)

研究了新型的FDP FPGA电路结构及其设计实现.新颖的基于3输入查找表的可编程单元结构,与传统的基于4输入查找表相比,可以提高约11%的逻辑利用率;独特的层次化的分段可编程互联结构以及高效的开关盒设计,使得不同的互联资源可以快速直接相连,大大提高了可编程布线资源效率.FDP芯片包括1600个可编程逻辑单元、160个可用IO、内嵌16k双开块RAM,采用SMIC 0.18μm CMOS工艺全定制方法设计并流片,其裸芯片面积为6.104mm×6.620mm.最终芯片软硬件测试结果表明:芯片各种可编程资源可以高效地配合其软件正确实现用户电路功能. 相似文献

3.

具有高资源利用率特征的改进型查找表电路结构与优化方法

高丽江杨海钢李威郝亚男刘长龙石彩霞《电子与信息学报》2019,41(10):2382-2388

该文着重研究了FPGA芯片中核心模块基本可编程逻辑单元(BLE)的电路结构与优化设计方法,针对传统4输入查找表(LUT)进行逻辑操作和算术运算时资源利用率低的问题,提出一种融合多路选择器的改进型LUT结构,该结构具有更高面积利用率;同时提出一种对映射后网表进行统计的评估优化方法,可以对综合映射后网表进行重新组合,通过预装箱产生优化后网表;最后,对所提结构进行了实验评估和验证。结果表明:与Intel公司Stratix系列FPGA相比,采用该文所提优化结构,在MCNC电路集和VTR电路集下,资源利用率平均分别提高了10.428% 和 10.433%,有效提升了FPGA的逻辑效能。相似文献

4.

Circuit design of a novel FPGA chip FDP2008

Wu Fang Wang Yabin Chen Liguang Wang Jian Lai Jinmei Wang Yuan Tong Jiarong 《半导体学报》2009,30(11)

A novel FPGA chip FDP2008 (Fudan Programmable Logic) has been designed and implemented with the SMIC 0.18μm CMOS logic 1P6M process. The new design method means that the configurable logic block can be configured as distributed RAM and a shift register. A universal programmable routing circuit is also presented; by adopting offset lines, complementary hanged end-lines and MUX + Buffer routing switches, the whole FPGA chip is highly repeatable, and the signal delay is uniform and predictable over the total chip. A standard configuration interface SPI is added in the configuration circuit, and a group of highly sensitive amplifiers is used to magnify the read back data. FDP2008 contains 20 × 30 logic TILEs, 200 programmable IOBs and 10 × 4 kbit dual port block RAMs. The hardware software cooperation test shows that FDP2008 works correctly and efficiently. 相似文献

5.

A resource-efficient and scalable wireless mesh routing protocol 总被引：3，自引：0，他引：3

Jianliang Myung J. 《Ad hoc Networks》2007,5(6):704-718

By binding logic addresses to the network topology, routing can be carried out without going through route discovery. This eliminates the initial route discovery latency, saves storage space otherwise needed for routing table, and reduces the communication overhead and energy consumption. In this paper, an adaptive block addressing (ABA) scheme is first introduced for logic address assignment as well as network auto-configuration purpose. The scheme takes into account the actual network topology and thus is fully topology-adaptive. Then a distributed link state (DLS) scheme is further proposed and put on top of the block addressing scheme to improve the quality of routes, in terms of hop count or other routing cost metrics used, robustness, and load balancing. The network topology reflected in logic addresses is used as a guideline to tell towards which direction (rather than next hop) a packet should be relayed. The next hop is derived from each relaying node’s local link state table. The routing scheme, named as topology-guided DLS (TDLS) as a whole, scales well with regard to various performance metrics. The ability of TDLS to provide multiple paths also precludes the need for explicit route repair, which is the most complicated part in many wireless routing protocols. While this paper targets low rate wireless mesh personal area networks (LR-WMPANs), including wireless mesh sensor networks (WMSNs), the TDLS itself is a general scheme and can be applied to other non-mobile wireless mesh networks. 相似文献

6.

Design and analysis of a dynamically reconfigurablethree-dimensional FPGA

Chiricescu S. Leeser M. Vai M.M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):186-196

This paper presents the design and analysis of a dynamically reconfigurable field programmable gate array (FPGA) that consists of three physical layers: routing and logic block layer, routing layer, and memory layer. The architecture was developed using a methodology that examines different architectural parameters and how they affect different performance criteria such as speed, area, and reconfiguration time. The resulting architecture has high performance while the requirement of balancing the areas of its constituent layers is satisfied 相似文献

7.

PVHArray:一种流水可伸缩的层次化可重构密码逻辑阵列结构 总被引：1，自引：0，他引：1

下载免费PDF全文

杜怡然李伟戴紫彬《电子学报》2020,48(4):781-789

针对密码算法的高效能实现问题,该文提出了一种基于数据流的粗粒度可重构密码逻辑阵列结构PVHArray.通过研究密码算法运算及控制结构特征,基于可重构阵列结构设计方法,提出了以流水可伸缩的粗粒度可重构运算单元、层次化互连网络和面向周期级的分布式控制网络为主体的粗粒度可重构密码逻辑阵列结构及其参数化模型.为了提升可重构密码逻辑阵列的算法实现效能,该文结合密码算法映射结果,确定模型参数,构建了规模为4×4的高效能PVHArray结构.基于55nm CMOS工艺进行流片验证,芯片面积为12.25mm²,同时,针对该阵列芯片进行密码算法映射.实验结果表明,该文提出高效能PVHArray结构能够有效支持分组、序列以及杂凑密码算法的映射,在密文分组链接（CBC）模式下,相较于可重构密码逻辑阵列REMUS_LPP结构,其单位面积性能提升了约12.9%,单位功耗性能提升了约13.9%. 相似文献

8.

A reconfigurable 8 GOP ASIC architecture for high-speed datacommunications

Grayver E. Daneshrad B. 《Selected Areas in Communications, IEEE Journal on》2000,18(11):2161-2171

A flexible and reconfigurable signal processing ASIC architecture has been developed, simulated, and synthesized. The proposed architecture compares favorably to classical DSP and FPGA solutions. It differs from general-purpose reconfigurable computing (RC) platforms by emphasizing high-speed application-specific computations over general-purpose flexibility. The proposed architecture can he used to realize any one of several functional blocks needed for the physical layer implementation of data communication systems operating at symbol rates in excess of 125 Msymbols/s. Multiple instances of a chip based on this architecture, each operating in a different mode, can be used to realize the entire physical layer of high-speed data communication systems. The architecture features the following modes (functions): real and complex FIR/IIR filtering, least mean square (LMS)-based adaptive filtering, discrete Fourier transforms (DFT), and direct digital frequency synthesis (DDFS) at up to 125 Msamples/s. All of the modes are mapped onto a common, regular data path with minimal configuration logic and routing. Multiple chips operating in the same mode can be cascaded to allow for larger blocks 相似文献

9.

The effect of LUT and cluster size on deep-submicron FPGA performance and density 总被引：2，自引：0，他引：2

Ahmed E. Rose J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(3):288-298

In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. 相似文献

10.

The effect of logic block architecture on FPGA performance

Singh S. Rose J. Chow P. Lewis D. 《Solid-State Circuits, IEEE Journal of》1992,27(3):281-287

This authors explore the effect of logic block architecture on the speed of a field-programmable gate array (FPGA). Four classes of logic block architecture are investigated: NAND gates, multiplexer configurations, lookup tables, and wide-input AND-OR gates. An experimental approach is taken, in which each of a set of benchmark logic circuits is synthesized into FPGAs that use different logic blocks. The speed of the resulting FPGA implementations using each logic block is measured. While the results depend on the delay of the programmable routing, experiments indicate that five- and six-input lookup tables and certain multiplexer configurations produce the lowest total delay over realistic values of routing delay. The fine grain blocks, such as the two-input NAND gate, exhibit poor performance because these gates require many levels of logic block to implement the circuits and hence require a large routing delay 相似文献

11.

Sharing of SRAM Tables Among NPN-Equivalent LUTs in SRAM-Based FPGAs

Meyer J. Kocan F. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(2):182-195

This article introduces a novel lookup table (LUT) and its usage in the configurable logic block (CLB) architectures for SRAM-based field-programmable gate array (FPGA) architectures. The proposed CLB allows sharing of SRAM tables of LUTs among NPN-equivalent functions to reduce the size of memories used for storing the functions and also reduces the number of configuration bits required. We measured many different characteristics of FPGAs using our new CLB architecture, including area, delay, routing, and power requirements. We experimentally found that for many different FPGA architectures, CLBs can share one-fourth of their SRAM tables between two basic logic elements (BLEs), which reduced both power consumption and area without negatively affecting routing or wirelength, and there was only a negligible increase in critical path delay of 0.27%. Specifically, we find that FPGAs consisting of CLBs with 16 BLEs and 34 inputs can be implemented with eight normal SRAMs and four SRAMs shared between two BLEs, for an overall reduction of four out of sixteen SRAM tables per CLB. With this new CLB architecture, we measured an approximate reduction in overall power consumption of 2% and an estimated reduction in area of 3% 相似文献

12.

GlitchLess: Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(11):1521-1534

This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow. 相似文献

13.

Design and implementation of a programming circuit in radiation-hardened FPGA

吴利华韩小炜赵岩刘忠立于芳陈陵都《半导体学报》2011,32(8):132-137

We present a novel programming circuit used in our radiation-hardened field programmable gate array （FPGA） chip.This circuit provides the ability to write user-defined configuration data into an FPGA and then read it back.The proposed circuit adopts the direct-access programming point scheme instead of the typical long token shift register chain.It not only saves area but also provides more flexible configuration operations.By configuring the proposed partial configuration control register,our smallest configuration section can be conveniently configured as a single data and a flexible partial configuration can be easily implemented.The hierarchical simulation scheme, optimization of the critical path and the elaborate layout plan make this circuit work well.Also,the radiation hardened by design programming point is introduced.This circuit has been implemented in a static random access memory（SRAM）-based FPGA fabricated by a 0.5μm partial-depletion silicon-on-insulator CMOS process.The function test results of the fabricated chip indicate that this programming circuit successfully realizes the desired functions in the configuration and read-back.Moreover,the radiation test results indicate that the programming circuit has total dose tolerance of 1×10⁵ rad（Si）,dose rate survivability of 1.5×10¹¹ rad（Si）/s and neutron fluence immunity of 1×10¹⁴ n/cm². 相似文献

14.

面向分组密码的可重构异构多核并行处理架构

下载免费PDF全文

冯晓李伟戴紫彬马超李功丽《电子学报》2017,45(6):1311-1320

现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA（Reconfigurable Asymmetrical Multi-Core Architecture）,分析了典型SP（AES-128）、Feistel（SMS4）、L-M（IDEA）及MISTY（KASUMI）结构算法在RAMCA上的映射过程.在65nm CMOS工艺下完成了逻辑综合和功能仿真.实验表明,RAMCA工作频率可达到1GHz,面积约为1.13mm²,消除工艺影响后,对各分组密码算法的运算速度均高于现有专用指令处理器以及Celator、RCPA和BCORE等阵列结构密码处理系统. 相似文献

15.

A nonvolatile programmable solid-electrolyte nanometer switch 总被引：1，自引：0，他引：1

Kaeriyama S. Sakamoto T. Sunamura H. Mizuno M. Kawaura H. Hasegawa T. Terabe K. Nakayama T. Aono M. 《Solid-State Circuits, IEEE Journal of》2005,40(1):168-176

A reconfigurable LSI employing a nonvolatile nanometer-scale switch, NanoBridge, is proposed, and its basic operations are demonstrated. The switch, composed of solid electrolyte copper sulfide, has a <30-nm contact diameter and <100-/spl Omega/ on-resistance. Because of its small size, it can be used to create extremely dense field-programmable logic arrays. A 4 /spl times/ 4 crossbar switch and a 2-input look-up-table circuit are fabricated with 0.18-/spl mu/m CMOS technology, and operational tests with them have confirmed the switch's potential for use in programmable logic arrays. A 1-kb nonvolatile memory is also presented, and its potential for use as a low-voltage memory device is demonstrated. 相似文献

16.

FPGA高性能查找表的设计与实现

张惠国唐玉兰于宗光陶宇峰《固体电子学研究与进展》2009,29(4)

从电路角度探讨了查找表(LUT)实现原理,基于双相不交叠时钟,设计实现了一种LUT,能高效地完成移位寄存器与RAM的功能扩展。基于SMIC0.25μmCMOS工艺优化设计了对应的版图,给出了相应的HSPICE仿真结果。此电路结构增强了逻辑块的性能,提高了FPGA的整体效率与灵活性,已被应用于FPGA的设计中。相似文献

17.

High-speed IP routing with binary decision diagrams based hardware address lookup engine 总被引：3，自引：0，他引：3

Sangireddy R. Somani A.K. 《Selected Areas in Communications, IEEE Journal on》2003,21(4):513-521

With a rapid increase in the data transmission link rates and an immense continuous growth in the Internet traffic, the demand for routers that perform Internet protocol packet forwarding at high speed and throughput is ever increasing. The key issue in the router performance is the IP address lookup mechanism based on the longest prefix matching scheme. Earlier work on fast Internet protocol version 4 (IPv4) routing table lookup includes, software mechanisms based on tree traversal or binary search methods, and hardware schemes based on content addressable memory (CAM), memory lookups and the CPU caching. These schemes depend on the memory access technology which limits their performance. The paper presents a binary decision diagrams (BDDs) based optimized combinational logic for an efficient implementation of a fast address lookup scheme in reconfigurable hardware. The results show that the BDD hardware engine gives a throughput of up to 175.7 million lookups per second (Ml/s) for a large AADS routing table with 33 796 prefixes, a throughput of up to 168.6 Ml/s for an MAE-West routing table with 29 487 prefixes, and a throughput of up to 229.3 Ml/s for the Pacbell routing table with 6822 prefixes. Besides the performance of the scheme, routing table update and the scalability to Internet protocol version 6 (IPv6) issues are discussed. 相似文献

18.

一种FPGA新型逻辑单元结构的设计

李丽刘桥《现代电子技术》2005,28(22):79-82

提出了一种FPGA可编程逻辑单元的新结构,该结构具有较多的输入端数和输出端数,并加入了专用的快速进位逻辑、专用级联链等功能,使得这种结构可用来实现任意4输入的逻辑函数和某些高达11个变量的输入函数;这种结构还可同时实现两个任意3输入的逻辑函数或最多5输入的某些函数,而且也能实现快速的进位计算和高扇入的逻辑运算.与目前一些商业FPGA的逻辑结构进行比较表明,本文提出的单元结构不仅具有较高的资源利用率,而且在性能和函数实现能力上都有较大的优势. 相似文献

19.

中规模CMOS电路的电离辐射效应

任迪远米拉提汗张玲珊陆妩严荣良《微电子学》1988,(6)

本文报道了国产中规模体硅CMOS电路在~(60)Coγ射线和1.5MeV电子辐照下的总剂量效应的研究结果。试验表明,器件的软失效(参数退化达到某一损伤阈值)通常在400Gy(Si),而逻辑功能的失效则发生在1000Gy(Si)以后。同时,器件的软失效与辐照偏置条件没有明显的依赖关系,但软失效的参数却依赖于偏置条件及各厂家MOS工艺的差异。相似文献

20.

Architecture of field-programmable gate arrays: the effect of logicblock functionality on area efficiency

Rose J. Francis R.J. Lewis D. Chow P. 《Solid-State Circuits, IEEE Journal of》1990,25(5):1217-1225

The relationship between the functionality of a field-programmable gate array (FPGA) logic block and the area required to implement digital circuits using that logic block is examined. The investigation is done experimentally by implementing a set of industrial circuits as FPGAs using CAD (computer-aided design) tools for technology mapping, placement, and routing. A range of programming technologies (the method of FPGA customization) is explored using a simple model of the interconnection and logic block area. The experiments are based on logic blocks that use lookup tables for implementing combinational logic. Results indicate that the best number of inputs to use (a measure of the block's functionality) is between three and four, and that a D flip-flop should be included in the logic block. The results are largely independent of the programming technology. More generally, it was observed that the area efficiency of a logic block depends not only on its functionality but also on the average number of pins connected per logic block 相似文献