期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Embedded Processor Validation Environment Using a Cycle-Accurate Retargetable Instruction-Set Simulator

Hoonmo?Yang Email author Moonkey?Lee 《The Journal of supercomputing》2005,33(1):19-32

相似文献

2.

配置流驱动计算体系结构指导下的ASIP设计 总被引：1，自引：0，他引：1

李勇王志英赵学秘岳虹《计算机研究与发展》2007,44(4):714-721

为了兼顾嵌入式处理器设计中的灵活性与高效性,提出配置流驱动计算体系结构.在体系结构设计中将软/硬件界面下移,使功能单元之间的互连网络对编译器可见,并由编译器来完成传输路由,从而支持复杂但更为高效的互连网络.在该体系结构指导下,提出一种支持段式可重构互连网络的专用指令集处理器(ASIP)设计方法.该方法应用到密码领域的3类ASIP设计中表明,与简单总线互连相比,在不影响性能的前提下,可平均节约53%的互连功耗和38.7%的总线数量,从而达到减少总线数量、降低互连功耗的目的. 相似文献

3.

基于集束式整数线性规划模型的专用指令集自动定制

赵康边计年董社勤《计算机辅助设计与图形学学报》2007,19(10):1229-1234

提出集束式整数线性规划形式化模型,利用指令间的功能依赖性解决专用指令集处理器中指令集自动定制的指数性空间问题.在此基础上,针对其前端和后端分别提出了相应的指令定制实现策略.实验结果表明,该指令定制方法可以有效地实现专用指令集的自动设计,并使最终处理器的运算性能得到优化. 相似文献

4.

Automatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project

《Microprocessors and Microsystems》2017

相似文献

5.

基于存储技术的高速嵌入式处理器的设计与实现 总被引：1，自引：0，他引：1

张钦韩承德《计算机学报》2007,30(5):831-837

SoPC(片上可编程系统,System on a Programmable Chip)在嵌入式系统中有着广泛的应用,通常用FPGA(现场可编程门阵列,Field Programmable Gate Array)实现.一类嵌入式处理器,例如小波变换处理器、压缩和解压缩处理器、FFT处理器,都可以采用基于存储技术的设计方法.FPGA的片内存储资源相对较少,如何有效地利用FPGA的片内存储资源实现高速的嵌入式处理器成为需要研究的问题.文中以FFT处理器为例说明这种方法的有效性,通过采用一种地址映射调度策略和两种无冲突操作数地址映射方式,减少了所使用的FPGA片内存储资源,提高了处理速度.该FFT处理器在实际系统中起到了关键作用. 相似文献

6.

FlexWare: a retargetable, embedded-software development environment

《Design & Test of Computers, IEEE》2002,19(4):59-69

Effective embedded software development tools are essential to better exploit the inherent capabilities of these processors. We developed the FlexWare embedded software development environment in response to this need, focusing essentially on the performance and retargetability of our tools. Our benchmarks demonstrate that, despite the wide range of processors we cover, we have achieved state-of-the-art embedded software-tool performance and functionality. Moreover, we demonstrate their wide range of retargetability, ranging from simple microcontrollers to complex multimedia DSPs and network processors 相似文献

7.

Computation in the Context of Transport Triggered Architectures 总被引：1，自引：0，他引：1

Henk Corporaal Johan Janssen Marnix Arnold 《International journal of parallel programming》2000,28(4):401-427

Processors used in embedded systems have specific requirements which are not always met by off-the-shelf processors. A templated processor architecture, which can easily be tuned towards a certain application (domain) offers a solution. The transport triggered architecture (TTA) template presented in this paper has a number of properties that make it very suitable for embedded system design. Key to its success is to give the compiler more control; it has to schedule all data transports within the processor. This paper highlights two important TTA-related issues. First a new code generation method for TTAs is discussed; it integrates scheduling and register allocation, thereby avoiding the notorious phase ordering problem between these two steps. Secondly, we discuss how to tune the instruction repertoire for an embedded processor. A tool is described which automatically detects frequent patterns of operations. These patterns can then be implemented on special function units. 相似文献

8.

Flexible VLIW processor based on FPGA for efficient embedded real-time image processing

Vincent Brost Fan Yang Charles Meunier 《Journal of Real-Time Image Processing》2014,9(1):47-59

Modern field programmable gate array (FPGA) chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance VLIW (very long instruction word) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient ILP (instruction-level parallelism) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors to shorten the development cycle, and to use the powerful FPGA resources to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allows exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated using the proposed development cycle. We also realized the rapid prototyping of embedded contactless palmprint extraction on an FPGA Virtex-6 based board for a biometric application and obtained a processing time of 145.6 ms per image. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. 相似文献

9.

Using task migration to improve non-contiguous processor allocation in NoC-based CMPs

《Journal of Systems Architecture》2013,59(7):468-481

In this paper, a processor allocation mechanism for NoC-based chip multiprocessors is presented. Processor allocation is a well-known problem in parallel computer systems and aims to allocate the processing nodes of a multiprocessor to different tasks of an input application at run time. The proposed mechanism targets optimizing the on-chip communication power/latency and relies on two procedures: processor allocation and task migration. Allocation is done by a fast heuristic algorithm to allocate the free processors to the tasks of an incoming application when a new application begins execution. The task-migration algorithm is activated when some application completes execution and frees up the allocated resources. Task migration uses the recently deallocated processors and tries to rearrange the current tasks in order to find a better mapping for them. The proposed method can also capture the dynamic traffic pattern of the network and perform task migration based on the current communication demands of the tasks. Consequently, task migration adapts the task mapping to the current network status. We adopt a non-contiguous processor allocation strategy in which the tasks of the input application are allowed to be mapped onto disjoint regions (groups of processors) of the network. We then use virtual point-to-point circuits, a state-of-the-art fast on-chip connection designed for network-on-chips, to virtually connect the disjoint regions and make the communication latency/power closer to the values offered by contiguous allocation schemes. The experimental results show considerable improvement over existing allocation mechanisms. 相似文献

10.

虚拟指令集的构建及翻译技术研究

谢耀滨蒋烈辉尹青张媛媛朱杰《计算机工程与设计》2007,28(14):3489-3491,3538

利用虚拟指令作为中间语言来构建可重用指令集模拟器是解决模拟器可重用性的重要技术.介绍了可重用指令集模拟器的工作原理,提出了虚拟指令的构建原则和方法,描述了对汇编指令的语义规则,最后举例说明如何从汇编指令的语义描述规则出发,生成与目标指令语义等价的虚拟指令. 相似文献

11.

Instruction scheduling and transformation for a VLIW unified reduced instruction set computer/digital signal processor processor with shared register architecture

Cheng‐Yu Lee Min‐Chin Hung Rong‐Guey Chang 《Concurrency and Computation》2014,26(1):134-151

The popularity of multimedia applications made them a major theme in embedded systems. The key component for supporting multimedia application well is embedded processor. Thus, we have designed and implemented an embedded processor, called UniDual processor, to achieve this objective. Its key features are the integration of instructions of reduced instruction set computers (RISCs) and digital signal processors (DSPs) as well as the support of special instruction set and shared‐based clustered register architecture. However, an important issue of UniDual that remains open is how to efficiently allocate registers. In this paper, we present a scheduling and instruction transformation approach to resolve the aforementioned issue. The proposed approach schedules instructions and then transforms overlapped instructions into RISC and DSP instructions by taking communication overhead and hardware limitations into account. Compared with the greedy approach, the evaluation shows that our work is relatively effective in performance and code size reduction. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

12.

Fault-Tolerant Scheduling for Real-Time Embedded Control Systems 总被引：8，自引：0，他引：8

下载免费PDF全文

Chun-HuaYang GeertDeconinck Wei-HuaGui 《计算机科学技术学报》2004,19(2):0-0

With the increasing complexity of industrial application, an embedded control system (ECS) requires processing a number of hard real-time tasks and needs fault-tolerance to assure high reliability. Considering the characteristics of real-time tasks in ECS, an integrated algorithm is proposed to schedule real-time tasks and to guarantee that all real-time tasks are completed before their deadlines even in the presence of faults. Based on the nonpreemptive critical-section protocol (NCSP), this paper analyzes the blocking time introduced by resource conflicts of relevancy tasks in fault-tolerant multiprocessor systems. An extended schedulability condition is presented to check the assignment feasibility of a given task to a processor. A primary/backup approach and on-line replacement of failed processors are used to tolerate processor failures. The analysis reveals that the integrated algorithm bounds the blocking time, requires limited overhead on the number of processors, and still assures good processor utilization. This is also demonstrated by simulation results. Both analysis and simulation show the effectiveness of the proposed algorithm in ECS. 相似文献

13.

VMW: a visualization-based microarchitecture workbench

Diep T.A. Shen J.P. 《Computer》1995,28(12):57-64

Superscalar processor design requires increasingly sophisticated software tools. The visualization-based microarchitecture workbench described in the paper addresses weaknesses common to most performance simulators: the lack of retargetability, visualization support, and interactive control. VMW provides a multifunction workbench for aiding designers of modern superscalar processors. It facilitates rigorous machine specification by providing specification templates at both the architecture and microarchitecture levels 相似文献

14.

异构多核处理器体系结构设计研究 总被引：2，自引：0，他引：2

陈芳园张冬松王志英《计算机工程与科学》2011,33(12):27-36

多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析... 相似文献

15.

异构处理器多操作系统协同技术研究

冯瑞青张激赵俊才《计算机系统应用》2018,27(12):90-95

随着嵌入式设备应用场景日趋复杂的变化,异构多核架构逐渐成为嵌入式处理器的主流架构.目前,多核处理器主要采用的单操作系统模式在实际应用中存在诸多局限性.为了充分发挥异构处理器的多核特性,针对异构处理器不同核部署相应的操作系统并实现多操作系统协同处理技术至关重要.本文对异构多核处理器（ARM+DSP）操作系统进行了研究,在异构多核平台上成功移植了嵌入式Linux和国产DSP实时操作系统ReWorks;为实现ReWorks与Linux操作系统协同处理,本文对核间通信的关键技术进行分析研究,并以TI公司的AM5718为例,设计了一系列多核异构通信组件.经测试,本文设计的异构通信组件实现了在ARM上对DSP核进行ReWorks操作系统和应用程序的动态加载、Linux与ReWorks核间消息收发、以及Linux与ReWorks的协同计算等功能. 相似文献

16.

并行可配置ECC专用指令协处理器 总被引：2，自引：1，他引：1

仲先海徐金甫严迎建《计算机工程》2009,35(5):153-155

采用软硬件结合的方法,给出一种基于VLIW的并行可配置椭圆曲线密码体制（ECC）专用指令协处理器架构。该协处理器采用点加、倍点并行调度算法,功能单元微结构采用可重构的思想,具有高度灵活性与较高运算速度,能支持域宽可伸缩的GF（p）与G只2″）有限域上的可变参数Weierstrass曲线,签名认证算法可升级。实验结果表明,GF（p）域上192bit的ECC点乘运算只需0．32ms,比其他同类芯片运算速度提高了116％～350％。相似文献

17.

An early-stage statement-level metric for energy characterization of embedded processors

《Microprocessors and Microsystems》2020

This work presents an early stage statement-level metric for energy characterization of embedded processors. Definition and the framework for metric evaluation are provided. In particular, such a metric is based on an existing assembly-level analysis and some profiling activities performed on a given C benchmark, and it is related to the average energy consumption of a generic C statement, for a given target processor. Its evaluation is performed with a one-time effort and, once available, it can be used to rapidly estimate the energy consumption of a given C function for all the considered processors. Two reference embedded processors are then considered in order to show an example of usage of the proposed metric and framework. 相似文献

18.

多处理器片上系统任务调度研究进展评述 总被引：9，自引：0，他引：9

李仁发刘彦徐成《计算机研究与发展》2008,45(9)

多处理器片上系统在单芯片上集成了多种指令集处理器,可完成复杂完整的功能,在图像处理、网络多媒体和嵌入式系统等应用领域前景广阔.任务映射与调度是多处理器片上系统设计的关键问题之一.介绍了多处理器片上系统的基本结构和面临的挑战,从调度算法分析和实现框架两个方面着重探讨了近年来多处理器片上系统任务调度的国内外研究进展情况,分析了当前亟待解决的问题与下一步主要的研究方向,可为多处理器片上系统相关研究提供参考. 相似文献

19.

A holistic approach for tightly coupled reconfigurable parallel processors

Hritam Dutta Dmitrij Kissler Frank Hannig Alexey Kupriyanov Jürgen Teich Bernard Pottier 《Microprocessors and Microsystems》2009,33(1):53-62

New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor. 相似文献

20.

Hybrid functional- and instruction-level power modeling for embedded and heterogeneous processor architectures

《Journal of Systems Architecture》2007,53(10):689-702

In this contribution the concept of functional- level power analysis (FLPA) for power estimation of programmable processors is extended in order to model embedded as well as heterogeneous processor architectures featuring different embedded processor cores. The basic FLPA approach is based on the separation of the processor architecture into functional blocks like, e.g. processing unit, clock network, internal memory, etc. The power consumption of these blocks is described by parameterized arithmetic models. By application of a parser based automated analysis of assembler codes the input parameters of the arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. For modeling an embedded general purpose processor (here, an ARM940T) the basic FLPA modeling concept had to be extended to a so-called hybrid functional-level and instruction-level (FLPA/ILPA) model in order to achieve a good modeling accuracy. In order to show the applicability of this approach even a heterogeneous processor architecture (OMAP5912) featuring an ARM926EJ-S core and a C55x DSP core has been modeled using the hybrid FLPA/ILPA technique described before. The approach is exemplarily demonstrated and evaluated applying a variety of basic digital signal processing tasks ranging from basic filters to complete audio decoders or classical benchmark suits. Estimated power figures for the inspected tasks are compared to physically measured values for both inspected processor architectures. A resulting maximum estimation error of 9% for the ARM940T and less than 4% for the OMAP5912 is achieved. 相似文献