期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Partitioning Methodology for Heterogeneous Reconfigurable Functional Units 总被引：1，自引：0，他引：1

Michalis D. Galanis Gregory Dimitroulakos Costas E. Goutis 《The Journal of supercomputing》2006,38(1):17-34

A partitioning methodology between the reconfigurable hardware blocks of different granularity, which are embedded in a generic heterogeneous architecture, is presented. The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by a 2-Dimensional Array of Processing Elements. Critical parts, called kernels, are mapped on the coarse-grain reconfigurable logic for improving performance. The partitioning method is mainly composed by three steps: the analysis of the input code, the mapping onto the Coarse-Grain Reconfigurable Array and the mapping onto the FPGA. The partitioning flow is implemented by a prototype software framework. Analytical partitioning experiments, using five real-world applications, show that the execution time speedup relative to an all-FPGA solution ranges from 1.4 to 5.0. 相似文献

2.

Software development for high-performance, reconfigurable, embedded multimedia systems 总被引：1，自引：0，他引：1

La Rosa A. Lavagno L. Passerone C. 《Design & Test of Computers, IEEE》2005,22(1):28-38

Reconfigurable platforms can be very effective for lowering production costs because they allow the reuse of architecture resources across a variety of applications. We show how to program a reduced-instruction-set-computing (RISC) microprocessor with a reconfigurable functional unit, focusing on DSP applications and using the example of a turbodecoder. We have developed a complete design flow, including a methodology and compilation tool chain, to address the instruction set hardware-software codesign problem for a processor with a runtime reconfigurable unit. The flow starts from a system-level specification (usually a software program) of the application and partitions it into software and hardware domains to achieve the best speed, power, and area performance, while satisfying resource constraints imposed by the target platform architecture. We describe a methodology and a set of tools that allow extensive design exploration for hardware-software codesign with the goal of improving the overall utilization of reconfigurable multimedia platforms. 相似文献

3.

Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures 总被引：2，自引：0，他引：2

Shobana Padmanabhan Phillip Jones David V. Schuehler Scott J. Friedman Praveen Krishnamurthy Huakai Zhang Roger Chamberlain Ron K. Cytron Jason Fritts John W. Lockwood 《International journal of parallel programming》2005,33(2-3):115-136

Applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture, at the ISA and microarchitecture levels. Generic processors, such as ARM and Power PC, are inexpensive, but with respect to a given application, they often overprovision in areas that are unimportant for the application’s performance. Moreover, while application-specific, customized logic could dramatically improve the performance of an application, that approach is typically too expensive to justify its cost for most applications. In this paper, we describe our experience using reconfigurable architectures to develop an understanding of an application’s performance and to enhance its performance with respect to customized, constrained logic. We begin with a standard ISA currently in use for embedded systems. We modify its core to measure performance characteristics, obtaining a system that provides cycle-accurate timings and presents results in the style of gprof, but with absolutely no software overhead. We then provide cache-behavior statistics that are typically unavailable in a generic processor. In contrast with simulation, our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem. Finally, in response to the performance profile developed on our platform, we evaluate various uses of the FPGA-realized instruction and data caches in terms of the application’s performance. 相似文献

4.

面向应用的可重构编译器ASCRA(英文) 总被引：1，自引：0，他引：1

下载免费PDF全文

吴艳霞顾国昌孙延腾杨敏杨杰牛晓霞孙霖《计算机科学与探索》2011,5(3):267-279

在很多应用领域已经开展了可重构计算的研究,但是由于缺乏高层设计工具,设计者需要较深的软件和硬件专业知识才能开发GPP/RAU架构的程序,阻碍了其大规模应用。提出了一种面向应用的可重构编译器——ASCRA的初始架构,它可以自动将C语言映射为VHDL语言,从而解决可重构计算中自动编译工具的瓶颈。ASCRA编译器主要研究软硬件划分技术和面向硬件的优化技术,如脉动阵列、循环流水技术。在ML505开发平台上,设计实现了ASCRA编译器的验证平台,并通过实验给出了核心程序段生成VHDL代码的综合信息。相似文献

5.

An architecture framework for an adaptive extensible processor

Hamid Noori Farhad Mehdipour Kazuaki Murakami Koji Inoue Morteza Saheb Zamani 《The Journal of supercomputing》2008,45(3):313-340

To improve the performance of embedded processors, an effective technique is collapsing critical computation subgraphs as application-specific instruction set extensions and executing them on custom functional units. The problem with this approach is the immense cost and the long times required to design a new processor for each application. As a solution to this issue, we propose an adaptive extensible processor in which custom instructions (CIs) are generated and added after chip-fabrication. To support this feature, custom functional units are replaced by a reconfigurable matrix of functional units (FUs). A systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit (RFU). We also introduce an integrated framework for generating mappable CIs on the RFU. Using this architecture, performance is improved by up to 1.33, with an average improvement of 1.16, compared to a 4-issue in-order RISC processor. By partitioning the configuration memory, detecting similar/subset CIs and merging small CIs, the size of the configuration memory is reduced by 40%. 相似文献

6.

HW/SW co-design of reconfigurable hardware-based genetic algorithm in FPGAs applicable to a variety of problems

Vishnu P. Nambiar Sathivellu Balakrishnan Mohamed Khalil-Hani M. N. Marsono 《Computing》2013,95(9):863-896

This paper describes the implementation of a reconfigurable hardware-based genetic algorithm (HGA) accelerator using the hardware-software (HW/SW) co-design methodology. This HGA is coupled with a unique TRNG that extracts random jitters from a phase lock loop (PLL) to ensure proper GA operation. It is then applied and benchmarked with several case studies, which include the optimization of a simple fitness function, a constrained Michalewicz function, and the tuning of parameters in finger-vein biometrics. A HGA solution is necessary in systems that demand high performance during the optimization process. However, implementations that are completely designed in hardware will result in a very rigid architecture, making it difficult to reconfigure the system for use in different applications. This paper aims to solve this issue by proposing a HGA design that provides reconfigurability and flexibility by moving problem-dependent processes into software. The prototyping platform used is an Altera Stratix II EP2S60 FPGA prototyping board with a clock frequency of 50 MHz. The HW/SW co-design technique is applied, and system partitioning is done based on aspects such as system constraints, operational intensity, process sequencing, hardware logic utilization, and reconfigurability. Experimental results show that the proposed HGA outperforms equivalent software implementations compiled with an open-sourced C++ GA component library (GAlib) running on the same prototyping platform by 102 times at most. In the final case study, the application of the proposed HGA in tunable parameter optimization in finger-vein biometrics improved the matching rate, reducing the equal error rate (EER) value from 1.004% down to 0.101%. 相似文献

7.

Real-time embedded systems powered by FPGA dynamic partial self-reconfiguration: a case study oriented to biometric recognition applications

Francisco Fons Mariano Fons Enrique Cantó Mariano López 《Journal of Real-Time Image Processing》2013,8(3):229-251

This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard architecture able to enhance the performance-cost trade-off delivered by other alternatives nowadays in the market like general-purpose multi-core processors. Our approach, sustained by hardware/software (HW/SW) co-design and run-time reconfigurable computing, is synthesizable in SRAM-based programmable logic. As proof-of-concept, a run-time partially reconfigurable field-programmable gate array (FPGA) is addressed to carry out a specific application of high-demanding computational power such as an automatic fingerprint authentication system (AFAS). Biometric personal recognition is a good example of compute-intensive algorithm composed of a series of image processing tasks executed in a sequential order. In our pioneer conception, these tasks are partitioned and synthesized first in a series of coprocessors that are then instantiated and executed multiplexed in time on a partially reconfigurable region of the FPGA. The implementation benchmark of the AFAS either as a pure software approach on a PC platform under a dual-core processor (Intel Core 2 Duo T5600 at 1.83 GHz) or as a reconfigurable FPGA co-design (identical algorithm partitioned in HW/SW tasks operating at 50 or 100 MHz on the second smallest device of the Xilinx Virtex-4 LX family) highlights a speed-up of one order of magnitude in favor of the FPGA alternative. These results let point out biometric recognition as a sensible killer application for run-time reconfigurable computing, mainly in terms of efficiently balancing computational power, functional flexibility and cost. Such features, reached through partial reconfiguration, are easily portable today to a broad range of embedded applications with identical system architecture. 相似文献

8.

Parallel application performance on shared high performance reconfigurable computing resources

Melissa C. Gregory D. 《Performance Evaluation》2005,60(1-4):107-125

The use of a network of shared, heterogeneous workstations each harboring a reconfigurable computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system's performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of RC systems. Our analytic performance model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. The methodology proves to be accurate in characterizing these effects for applications running on shared, homogeneous, and heterogeneous HPRC resources. The model error in all cases was found to be less than 5% for application runtimes greater than 30 s, and less than 15% for runtimes less than 30 s. 相似文献

9.

Energy-aware run-time task partition and allocation in dynamic partial reconfigurable systems

《Journal of Systems Architecture》2017

Dynamic partial reconfigurable systems with a processor and a field-programmable gate array are promising innovations for meeting the requirement of mobile embedded devices. These systems demonstrate low power consumption and high performance. However, with limited battery life and chip size, energy saving and area allocation are critical concerns for such systems. In this study, an energy-aware hardware/software partition is presented to minimize the system energy consumption, and a contention-aware task allocation is presented to minimize the area requirement and response time. The energy efficiency and schedulability of the proposed methodology were evaluated using a series of workloads, and impressive results were obtained. 相似文献

10.

Reconfigurable media processing

《Parallel Computing》2002,28(7-8):1111-1139

Multimedia processing is becoming increasingly important with wide variety of applications ranging from multimedia cell phones to high definition interactive television. Media processing techniques typically involve the capture, storage, manipulation and transmission of multimedia objects such as text, handwritten data, audio objects, still images, 2D/3D graphics, animation and full-motion video. A number of implementation strategies have been proposed for processing multimedia data. These approaches can be broadly classified into two major categories, namely (i) general purpose processors with programmable media processing capabilities, and (ii) dedicated implementations (ASICs). We have performed a detailed complexity analysis of the recent multimedia standard (MPEG-4) which has shown the potential for reconfigurable computing, that adapts the underlying hardware dynamically in response to changes in the input data or processing environment. We therefore propose a methodology for designing a reconfigurable media processor. This involves hardware–software co-design implemented in the form of a parser, profiler, recurring pattern analyzer, spatial and temporal partitioner. The proposed methodology enables efficient partitioning of resources for complex and time critical multimedia applications. 相似文献