期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王海晨赵祥模《微电子学与计算机》2012,29(7):1-3,7

本文提出了一种基于硬件抽象机的动态翻译技术,它可用于实现Java处理器.该技术采用了硬件抽象机的"模糊执行"(HAM)方法,通过分析Java程序之间的相关性,动态地将Java字节码转换成基于标签的类RISC指令.然后,将堆栈折叠与动态翻译相结合进一步优化指令.应用该技术设计了一个Java指令级并行处理器,并且扩展它,支持Java多线程功能. 相似文献

2.

基于FPGA的Java指令折叠器的研究与设计

张金钟胡平《微电子学与计算机》2011,28(5)

Java虚拟机的设计是基于堆栈的,它的性能由数据相关性而被限制.为了提高JVM的性能,于是sun公司提出了堆栈操作折叠机制并且用于picoJava Ⅰ、Ⅱ处理器,它折叠了42.3%的堆栈操作.通过把连续的字节码与预先定义的类型在指令译码器中对比,那么push、pop操作的数量就能被减少.文中为Java处理器设计了一种简单的指令折叠器,最终在FPGA上加以实现,从而大大地提高了JVM的性能. 相似文献

3.

基于RISC结构的Java处理器研究与设计

张金钟胡平《微电子学与计算机》2011,28(7):61-64

文中结合PicoJava和JOP等一些经典的Java处理器的优势,设计了一种基于RISC结构的Java处理器.它充分利用了Java指令折叠技术和精简指令集处理器的优势,不仅降低了设计复杂度,而且在很大程度上提高了Java处理器的性能. 相似文献

4.

面向VLIW结构的高性能代码生成技术 总被引：1，自引：1，他引：0

王红梅王敏张铁军单睿侯朝焕《微电子学与计算机》2010,27(2)

DSP处理器通过采用VLIW结构获得了高性能,同时也增加了编译器为其生成汇编代码的难度.代码生成器作为编译器的代码生成部件,是VLIW结构能够发挥性能的关键.由此提出并实现了一种基于可重定向编译框架的代码生成器.该代码生成器充分利用VLIW的体系结构特点,支持SIMD指令,支持谓词执行,能够生成高度指令级并行的汇编代码,显著提高应用程序的执行性能. 相似文献

5.

一种JavaIC卡专用CPU结构研究

下载免费PDF全文

王涛毛志刚叶以正《电子学报》2000,28(11):77-80

本文提出了一种针对Java卡虚拟机的硬件结构.Java卡虚拟机是Java语言与智能卡之间的接口,开发支持Java语言的智能卡对IC卡的发展十分重要.现有的智能卡中实现的Java卡虚拟机性能较差,本文给出一种基于Java卡虚拟机的专用CPU结构,该结构直接执行Java字节码,并采用了两条指令合并机制.文中将详细介绍所提出的硬件结构,包括指令集的选取,模块划分及各功能模块结构,并着重介绍硬件结构中的两条指令同时执行机制的实现. 相似文献

6.

基于FPGA的16位堆栈处理器的设计

储昭贤施慧彬《微电子学与计算机》2012,29(2):22-26

设计了一款面向嵌入式控制领域的16位堆栈处理器,该处理器包含两个堆栈:执行数学表达式的数据堆栈和支持子程序调用的返回堆栈,其指令集含35条堆栈指令.详细给出了该堆栈处理器的体系结构及设计方法;不仅采用简单有效的指令编码方式缩小了代码体积,同时给出了单周期操作多个堆栈元素的解决方法.该处理器采用FPGA实现,在XC5VLX110T芯片上的运行时钟频率最高达到146.7MHz.最后给出了设计的软件仿真与硬件综合结果. 相似文献

7.

面向具有VLIW结构DSP的汇编级翻译的方法

应欢王雷欧薛志远王东辉侯朝焕《微电子学与计算机》2014,(9)

为了简化不同体系结构间代码迁移工作,提出一种面向具有超长指令字架构的数字信号处理器的汇编级翻译的方法.前端分析将汇编代码中的指令信息同语义映射为机器无关的中间表示.采用路径探测法移除分支指令延迟槽构建指令流图,并重构源程序控制流图.基于各条指令的时间戳分配和指令间的数据依赖关系分析,移动代码和修改时间戳来线性化并行代码.实验证明,该方法能够正确翻译汇编程序. 相似文献

8.

一种采用3级指令流水线的51内核设计 总被引：1，自引：1，他引：0

黄敏敏林媛徐中佑《现代电子技术》2005,28(20):83-85

流水线技术是提高系统带宽的一项强大的实现技术,并且不需要大量附加的硬件设置.在微处理器设计中采用流水线技术是提高微处理器性能的一种很有效的方法.本文主要介绍了自行设计的一种采用3级指令流水线的51内核的设计和实现.内容包括：3级指令流水线的划分以及相应的系统结构框架,51指令集中各种类型指令的执行情况,间接寻址功能的实现方法,流水线数据相关问题的解决方案,最后讨论设计的FPGA实现. 相似文献

9.

低时钟速率的高性能DSP

钟《世界电子元器件》2002,(6):19-20

使DSP处理器达到高性能有多种方法。然而,传统的DSP性能几乎总是以MIPS来衡量的。传统DSP通常在每个时钟周期仅完成一次操作,因此MIPS可直接对应于以MHz表示的处理器频率。随着指令级并行(ILP)技术的出现,这些指标变得意义不大。ILP 相似文献

10.

面向椭圆曲线密码的处理器并行体系结构研究与设计

杨晓辉戴紫彬李淼张永福《通信学报》2011,32(5):70-77

在研究椭圆曲线密码算法的处理特征以及有限域层上的并行调度算法基础上,采用指令级并行和数据级并行方法,提出了面向椭圆曲线密码的并行处理器体系结构模型,并就模型的存储结构进行了分析。基于该模型实现了一款验证原型,在FPGA上成功进行了验证测试并在0.18μm CMOS工艺标准单元库下进行逻辑综合以及布局布线。实验证明提出的并行处理器体系结构既能保证椭圆曲线密码算法应用的灵活性,又能够达到较高的性能。相似文献

11.

A RISC architecture extended by an efficient tightly coupled reconfigurable unit

N. Vassiliadis N. Kavvadias G. Theodoridis S. Nikolaidis 《International Journal of Electronics》2013,100(6):421-438

In this paper, the architecture of an embedded processor extended with a tightly-coupled coarse-grain reconfigurable functional unit (RFU) is proposed. The efficient integration of the RFU with the control unit and the datapath of the processor eliminate the communication overhead between them. To speed up execution, the RFU exploits instruction level parallelism (ILP) and spatial computation. Also, the proposed integration of the RFU efficiently exploits the pipeline structure of the processor, leading to further performance improvements. Furthermore, a development framework for the introduced architecture is presented. The framework is fully automated, hiding all reconfigurable hardware related issues from the user. The hardware model of the architecture was synthesized in a 0.13?µm process and all information regarding area and delay were estimated and presented. A set of benchmarks is used to evaluate the architecture and the development framework. Experimental results prove performance improvements in addition to potential energy reduction. 相似文献

12.

Error detection by duplicated instructions in super-scalarprocessors

Oh N. Shirvani P.P. McCluskey E.J. 《Reliability, IEEE Transactions on》2002,51(1):63-75

This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycle 相似文献

13.

Scalable processor architecture for Java with explicit threadsupport

Buchenrieder K. Kress R. Pyttel A. Sedlmeier A. Veith C. 《Electronics letters》1997,33(18):1532-1534

A scalable processor architecture for multi-threaded Java^TM applications is presented. The proposed architecture consists of multiple application-specific processing elements, each able to execute a single thread at one time. The architecture is evaluated by implementing a portable and scalable Java machine on an FPGA board for demonstration 相似文献

14.

BIOS的设计与实现 总被引：5，自引：1，他引：4

韩山秀樊晓桠张盛兵周昔平《微电子学与计算机》2005,22(11):113-115,120

文章详细阐述了BIOS的基本组成框架,提出了一个适合于检测工控机硬件的BIOS上电自检流程,并就设计中的几个关键性问题:正确性,兼容性和可移植性,以及压缩算法等进行了分析,最后整个BIOS在西北工业大学航空微电子中心自主研发的龙腾S1系统(PC104兼容)平台上进行了严格的验证. 相似文献

15.

Java智能卡微处理器软件仿真指令的研究与实现

李飞卉张建杰葛元庆岳震伍周润德《微电子学》2002,32(5):325-329

JCP(Java Card Processor)是一种直接支持Java卡虚拟机运行的16位RISC微处理器.但Java卡虚拟机的支持面向对象的字节码指令功能较复杂,用硬件直接实现需要消耗大量的资源,不适合智能卡等硬件资源有限的系统.JCP提供一种硬件陷阱机制,在执行此类指令时,切换到相应的陷阱处理程序中,用软件仿真它们的功能.文章讨论了Java卡虚拟机二进制文件特点,软件仿真指令的面向对象的功能及其具体实现.通过仿真基于JCP的Java卡操作系统和应用程序,验证了软件仿真指令实现的正确性. 相似文献

16.

An Iterative Algorithm for Hardware-Software Partitioning, Hardware Design Space Exploration and Scheduling

Karam S. Chatha Ranga Vemuri 《Design Automation for Embedded Systems》2000,5(3-4):281-293

The paper proposes a novel heuristic technique for integrated hardware-software partitioning, hardware design space exploration and scheduling. The technique maps an application specified as a task graph on a heterogeneous architecture with an objective to minimize the latency of the task graph subject to the area constraint on the hardware coprocessor. The technique uses an iterative approach where the partitioner decides the processor mapping and HW design points of some tasks. The scheduler then simultaneously decides the processor mapping, HW design point and schedule time of the remaining tasks. There exists a tight coupling between the two design stages allowing them to produce superior quality designs in fewer iterations. The technique accounts for the time overheads due to inter-processor /intra-processor communication and shared memory access conflicts. It can therefore be used for both communication intensive and computation intensive applications. The technique also considers dynamic reconfiguration capability of the hardware coprocessor. The technique performs tradeoff analysis and maps hardware tasks to mutually exclusive temporal segments if this results in lower latency. The effectiveness of the technique is demonstrated by a case study of the JPEG image compression algorithm, comparison with an optimal ILP based approach and experimentation with synthetic graphs. 相似文献

17.

Hardware Realization of a Java Virtual Machine for High Performance Multimedia Applications

Mladen Berekovic Helge Kloos Peter Pirsch 《The Journal of VLSI Signal Processing》1999,22(1):31-43

This paper describes a new architecture for JAVA-based, interactive multimedia applications. A hardware implementation of a Java Virtual Machine (JVM) is proposed, which allows the direct execution of Java bytecode. In a single clock cycle, up to 3 bytecode instructions can be decoded and executed in parallel using a RISC pipeline. A splitable 64-bit ALU implementation addresses demanding processing requirements of typical multimedia signal processing schemes. The on-chip caches are adapted to the specific data structures of the JVM. The proposed architecture supports execution of multiple Java threads in parallel. An implementation of basic building blocks of the processor with a standard-cell library provides an estimate of 150 MHz clock-speed for a 0.35 m 3 metal layer CMOS process. With a size of less than 10 mm² needed for the core logic, it is possible to integrate multiple JVMs together with larger cache memories on a single chip. Based on these results, we discuss various performance aspects of JAVA for use in future multimedia terminals. 相似文献

18.

Active memory processor: a hardware garbage collector for real-time Java embedded devices

Srisa-an W. Lo C.-T.D. Chang J.-M. 《Mobile Computing, IEEE Transactions on》2003,2(2):89-101

Java possesses many advantages for embedded system development, including fast product deployment, portability, security, and a small memory footprint. As Java makes inroads into the market for embedded systems, much effort is being invested in designing real-time garbage collectors. The proposed garbage-collected memory module, a bitmap-based processor with standard DRAM cells is introduced to improve the performance and predictability of dynamic memory management functions that include allocation, reference counting, and garbage collection. As a result, memory allocation can be done in constant time and sweeping can be performed in parallel by multiple modules. Thus, constant time sweeping is also achieved regardless of heap size. This is a major departure from the software counterparts where sweeping time depends largely on the size of the heap. In addition, the proposed design also supports limited-field reference counting, which has the advantage of distributing the processing cost throughout the execution. However, this cost can be quite large and results in higher power consumption due to frequent memory accesses and the complexity of the main processor. By doing reference counting operation in a coprocessor, the processing is done outside of the main processor. Moreover, the hardware cost of the proposed design is very modest (about 8000 gates). Our study has shown that 3-bit reference counting can eliminate the need to invoke the garbage collector in all tested applications. Moreover, it also reduces the amount of memory usage by 77 percent. 相似文献