期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

祝鸣涛王友仁孙川王澜涛《微电子学》2010,40(4)

传统的可重构电路主要由细粒度数据处理单元组成,但是其实现的运算功能单一,且布线复杂,限制了可重构SoC电路的通用性和灵活性.针对以上问题,根据通信领域基带信号处理的运算特点,设计了一种新型可重构阵列电路,可作为运算模块嵌入可重构SoC,此阵列由粗粒度数据处理单元构成的细胞互联组成.针对基带信号数据位宽多样的特点,细胞可重构实现多种算子.通过在阵列中每个细胞内部都嵌入独立配置存储器,采用并行数据配置电路的方式,以降低阵列的重构时间开销,实现整个阵列的快速重构.以伪码捕获为例,对设计的电路进行仿真.结果显示,设计的结构布线方法简单、通用性及灵活性强. 相似文献

2.

PVHArray:一种流水可伸缩的层次化可重构密码逻辑阵列结构 总被引：1，自引：0，他引：1

下载免费PDF全文

杜怡然李伟戴紫彬《电子学报》2020,48(4):781-789

针对密码算法的高效能实现问题,该文提出了一种基于数据流的粗粒度可重构密码逻辑阵列结构PVHArray.通过研究密码算法运算及控制结构特征,基于可重构阵列结构设计方法,提出了以流水可伸缩的粗粒度可重构运算单元、层次化互连网络和面向周期级的分布式控制网络为主体的粗粒度可重构密码逻辑阵列结构及其参数化模型.为了提升可重构密码逻辑阵列的算法实现效能,该文结合密码算法映射结果,确定模型参数,构建了规模为4×4的高效能PVHArray结构.基于55nm CMOS工艺进行流片验证,芯片面积为12.25mm²,同时,针对该阵列芯片进行密码算法映射.实验结果表明,该文提出高效能PVHArray结构能够有效支持分组、序列以及杂凑密码算法的映射,在密文分组链接（CBC）模式下,相较于可重构密码逻辑阵列REMUS_LPP结构,其单位面积性能提升了约12.9%,单位功耗性能提升了约13.9%. 相似文献

3.

可重构结构设计空间快速搜索方法 总被引：1，自引：0，他引：1

季爱明沈海斌严晓浪《电子与信息学报》2006,28(9):1744-1747

在可重构结构评估模型的基础上,研究了在算法级估计可重构结构的面积、性能和功耗的方法。根据面积、性能和功耗,分两步搜索可重构结构的设计空间。首先,搜索结构域中每个结构实现所有算法时的最小代价,其次,在结构设计空间中搜索最优结构。该方法不依赖任何具体的架构,全面评价可重构结构的优劣,能快速获得全局最优的搜索结果。应用实例表明,在可重构结构设计初期,该方法能有效地指导可重构结构的设计。相似文献

4.

基于RCSIMD的8192点FFT并行算法研究

周国昌张立新《微电子学与计算机》2011,28(4)

文中提出了一种基于RCSIMD体系结构的8192点FFT的并行算法.该并行算法将8192数据分成连续64块,每块128个连续数据(存储在可重构处理元的局部存储器),采用RCSIMD可重构处理阵列完成块倒位序变换,块内只进行逻辑上的倒位序变换(倒位序过程隐含在配置数据中).这种数据存储和倒位序处理方法可以充分利用处理阵列通信网络和处理单元的能力. 相似文献

5.

可重构处理器阵列的系统级建模研究 总被引：1，自引：1，他引：0

潘鹏王鹏林水生《微电子学与计算机》2011,28(11):85-88,93

由于粗粒度可重构体系结构设计空间复杂,设计满足应用需求的CGRA需要建立系统级仿真模型进行性能评估．文中提出一种可重构处理器阵列的系统级模型,使用SystemC事务级语言实现建模．模型采用多层互连网络结构实现任意2个处理器间的通信,并且处理器的资源能够通过参数快速地进行配置．仿真实验表明,模型适用于应用算法到粗粒度可重构体系结构映射的模拟仿真．相似文献

6.

基于流水线映射的粗粒度可重构运算阵列电路设计

强倩张嘉琛《信息技术》2010,(6):83-86

媒体处理算法内在的并行性推动了媒体处理器朝着运算阵列架构的方向发展.在分析了算法映射对电路执行效果的影响后,将运算阵列设计与算法映射相结合,针对如何有效利用阵列提出了一种流水线映射的方案,并分析了该映射方法对系统性能的影响.在此基础之上,以H 264中的IDCT算法为例提取流水线模型,并基于该模型设计出了粗粒度的可重构阵列.实验结果表明,该阵列在功耗、速度、器件利用率等方面具有明显优势,具有较好的应用价值. 相似文献

7.

粗粒度可重构密码逻辑阵列智能映射算法研究 总被引：1，自引：0，他引：1

下载免费PDF全文

杜怡然杨萱戴紫彬南龙梅李伟《电子学报》2020,48(1):101-109

针对粗粒度可重构密码逻辑阵列密码算法映射周期长且性能不高的问题,该文通过构建粗粒度可重构密码逻辑阵列参数化模型,以密码算法映射时间及实现性能为目标,结合本文构建的粗粒度可重构密码逻辑阵列结构特征,提出了一种算法数据流图划分算法.通过将密码算法数据流图中节点聚集成簇并以簇为最小映射粒度进行映射,降低算法映射复杂度;该文借鉴机器学习过程,构建了具备学习能力的智慧蚁群模型,提出了智慧蚁群优化算法,通过对训练样本的映射学习,持续优化初始化信息素浓度矩阵,提升算法映射收敛速度,以已知算法映射指导未知算法映射,实现密码算法映射的智能化.实验结果表明,本文提出的映射方法能够平均降低编译时间37.9%并实现密码算法映射性能最大,同时,以算法数据流图作为映射输入,自动化的生成密码算法映射流,提升了密码算法映射的直观性与便捷性. 相似文献

8.

基于存储划分和路径重用的粗粒度可重构结构循环映射算法

张兴明袁开坚高彦钊《电子与信息学报》2018,40(6):1520-1524

目前针对粗粒度可重构结构循环映射的研究主要集中在操作布局和临时数据路由,缺乏考虑数据映射的研究,该文提出一种基于存储划分和路径重用的模调度映射流程。首先进行细粒度的存储划分找到合适的数据映射,提高数据存取的并行性,再用模调度寻找操作布局和临时数据路由,最后利用构建的路由开销模型平衡存储器路由和处理单元路由的使用,引入路径重用策略优化路由资源。实验结果表明,该方法在循环的启动间隔、每周期指令数和执行延迟等方面均具有良好的性能。相似文献

9.

以边为中心的密码逻辑阵列高能效映射算法

徐金甫章宇雷李伟陈韬《电子与信息学报》2022,43(6):1587-1595

为解决密码算法在粗粒度可重构密码逻辑阵列(CRCLA)上映射性能不高及编译时间长的问题,该文提出一种密码算法和硬件资源的描述形式,在映射过程中能够更加直观地显示各个资源的占用情况;并通过分析密码算法运算特征与粗粒度可重构密码逻辑阵列硬件结构的内在关联,以减少关键路径延时为目标,提出了一种以边为中心的密码逻辑阵列高能效映射算法(ECLMap).通过边映射来指导节点映射,结合相关映射策略,引入回溯机制来提高映射成功率.在仿真平台下对多种密码算法进行实验,相比于其他通用的映射算法,结果表明该文提出的算法映射性能最佳,在算法能效上平均提升了约20％,同时在编译时间上平均提升了约25％.实现了算法的高能效映射. 相似文献

10.

小数运动估计在可重构处理器ReMAP的映射实现

张文良雍珊珊戴鹏王新安谢峥《微电子学》2011,41(1)

介绍一种在可重构媒体处理器ReMAP上实现小数运动估计的方法.ReMAP处理器由可重构的运算单元阵列和互联单元级联组成,具有高度可重构性和强大的并行计算能力.在ReMAP中映射实现了小数运动估计中的1/2插值、1/4插值和搜索等算法.通过算法的仿真验证分析,ReMAP可支撑小数运动估计的高性能实现,达到或接近ASIC的性能,并具有较好的应用灵活性,适于媒体处理应用. 相似文献

11.

CGADL: An Architecture Description Language for Coarse-Grained Reconfigurable Arrays

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(9):1247-1259

相似文献

12.

Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System 总被引：1，自引：0，他引：1

Galanis M.D. Dimitroulakos G. Goutis C.E. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(12):1362-1366

This paper presents performance improvements and energy savings from mapping real-world benchmarks on an embedded single-chip platform that includes coarse-grained reconfigurable logic with a microprocessor. The reconfigurable hardware is a 2-D array of processing elements connected with a mesh-like network. Analytical results derived from mapping seven real-life digital signal processing applications, with the aid of an automated design flow, on six different instances of the system architecture are presented. Significant overall application speedups relative to an all-software solution, ranging from 1.81 to 3.99 are reported being close to theoretical speedup bounds. Additionally, the energy savings range from 43% to 71%. Finally, a comparison with a system coupling a microprocessor with a very long instruction word core shows that the microprocessor/coarse-grained reconfigurable array platform is more efficient in terms of performance and energy consumption. 相似文献

13.

Dynamic Context Compression for Low-Power Coarse-Grained Reconfigurable Architecture

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2010,18(1):15-28

Most of the coarse-grained reconfigurable architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead (2.16%). 相似文献

14.

Design of a coarse-grained reconfigurable architecture with floating-point support and comparative study

Manhwee Jo Dongwook Lee Kyuseung Han Kiyoung Choi 《Integration, the VLSI Journal》2014

With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. This paper presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. A two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. Fabricated using 130 nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125 MHz clock frequency and 1.2 V power supply. Experiments show 11.6× speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications. 相似文献

15.

Floating-Point FPGA: Architecture and Modeling

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(12):1709-1718

This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations are used to implement datapaths. In order to facilitate comparison with existing FPGA devices, the virtual embedded block scheme is proposed to model embedded blocks using existing field-programmable gate array (FPGA) tools. This methodology involves adopting existing FPGA resources to model the size, position, and delay of the embedded elements. The standard design flow offered by FPGA and computer-aided design vendors is then applied and static timing analysis can be used to estimate the performance of the FPGA with the embedded blocks. On selected floating-point benchmark circuits, our results indicate that the proposed architecture can achieve four times improvement in speed and 25 times reduction in area compared with a traditional FPGA device. 相似文献

16.

A Dynamic Optically Reconfigurable Gate Array—Perfect Emulation

Seto D. Watanabe M. 《Quantum Electronics, IEEE Journal of》2008,44(5):493-500

This paper presents a perfect dynamic optically reconfigurable gate array (DORGA) architecture emulation using a holographic memory and a conventional ORGA-VLSI. In ORGAs, although a large virtual gate count can be realized by exploiting the large-capacity storage capability of a holographic memory, the actual gate count, which is the gate count of a programmable gate array VLSI, is important to increase the instantaneous performance. Nevertheless, in previously proposed ORGA-VLSIs, the static configuration memory to store a single configuration context consumed a large implementation area of the ORGA-VLSIs and prevented the realization of large-gate-count ORGA-VLSIs. Therefore, a DORGA architecture has been proposed in order to increase the gate density. It uses the junction capacitance of photodiodes as dynamic memory, thereby obviating the static configuration memory. However, to date, demonstration of a perfect optically reconfigurable architecture for DORGA-VLSIs has never been presented. Therefore, in this study, the DORGA architecture was perfectly emulated, and the performance, particularly the reconfiguration context retention time, was measured experimentally. The advantages of this architecture are discussed in relation to the results. 相似文献

17.

Design and analysis of a dynamically reconfigurablethree-dimensional FPGA

Chiricescu S. Leeser M. Vai M.M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2001,9(1):186-196

This paper presents the design and analysis of a dynamically reconfigurable field programmable gate array (FPGA) that consists of three physical layers: routing and logic block layer, routing layer, and memory layer. The architecture was developed using a methodology that examines different architectural parameters and how they affect different performance criteria such as speed, area, and reconfiguration time. The resulting architecture has high performance while the requirement of balancing the areas of its constituent layers is satisfied 相似文献

18.

RDMM: Runtime dynamic migration mechanism of distributed cache for reconfigurable array processor

《Integration, the VLSI Journal》2020

Reconfigurable array processors have emerged as powerful solution to speed up computationally intensive applications. However, they may suffer from a data access bottleneck as the frequency of memory access rises. At present, the distributed cache design in the reconfigurable array processor has a large cache failure rate, and the frequent access to external memory leads to a long delay in memory access. To mitigate this problem, we present a Runtime Dynamically Migration Mechanism (RDMM) of distributed cache for reconfigurable array processor based on the feature of obvious locality and high parallelism in accessing data. This mechanism allows temporary, static data to be dynamically scheduled to migrate data with a high access frequency from the remote cache to the processor's local migration storage table based on how often the reconfigurable array processors access the remote cache. We can accurately get the data on the shortest path by way of data search strategy based on migration storage tables, thereby effectively reducing the access delay of the entire system, increasing the memory bandwidth of the reconfigurable array processor. We leverage the hardware platform of reconfigurable array processor to test the proposed mechanism. The experimental results show that RDMM reduces access delay by up to 35.24% compared with the tradition distributed cache at the highest conflict rate. And compared with the Ref.[19], Ref.[20], Ref.[21] and Ref.[23], the working frequency can be increased by 15%, the hit rate can be increased by 6.1%, and the peak bandwidth can be increased by about 3×. 相似文献