期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

邓林张瑶罗家豪《计算机科学》2023,(11):15-22

面对日益复杂的处理器设计和有限的设计周期，如何有效地快速进行性能评估，是每一个处理器设计团队需要解决的问题。完整的性能测试集需要运行较长的时间，特别是在硅前验证阶段，高昂的时间成本导致设计团队无法使用完整的性能测试集进行性能评估分析。文中介绍了一种通用处理器快速性能评测方法(Fast-Eval),Fast-Eval性能评测方法基于SimPoint技术，使用FastParallel-BBV方法、最优模拟点的选取以及模拟点的热迁移等方法，显著缩短了BBV生成时间和性能测试时间。实验结果表明，相比完整运行SPEC CPU 2006 REF数据规模测试程序获得的性能数据，所提方法在ARM64处理器上BBV生成时间缩短为原来的16.88%,性能评估时间缩短为原来的1.26%,性能评估结果的平均相对误差为0.53%;在FPGA开发板上测试集的平均相对误差可以达到0.40%,运行时间仅为完整运行时间的0.93%。相似文献

2.

基于LAN91C111的嵌入式以太网系统设计和评估 总被引：2，自引：1，他引：1

李润超张钦宇曾伟吴绍华《计算机工程与设计》2010,31(17)

为了实现高速、高性能的嵌入式以太网系统,提出了基于以太网控制器LAN91C111和32位高性能处理器MN103E的以太网传榆系统设计方案,并设计了用于检验以太网系统功能和传输性能的在板测试评估方案.针对评估测试程序的特点,详细阐述了评估测试程序的组成、驱动设计方法和具体评估测试的方案设计.最后通过实验验证了测试程序的正确性、完备性,同时也验证了以太网平台设计的正确和高效.该平台设计方案对其它嵌入式以太网的设计和实现具有很强的参考意义,在板评估方法也普遍适用于其它各种嵌入式以太网系统. 相似文献

3.

基于SoC-FPGA的RISC-V处理器软硬件系统级平台

齐乐常轶松陈欲晓张旭陈明宇包云岗张科《计算机研究与发展》2023,(6):1204-1215

构建软硬件系统级原型平台是处理器设计硅前测试中必不可少的环节.为适应基于开放指令集RISC-V的开源处理器设计需求，简化现有基于FPGA的处理器系统级原型平台构建方法，提出了一套基于SoC-FPGA的处理器敏捷软硬件原型平台，以实现目标软硬件设计的快速部署与系统级原型高效评测.针对上述目标，发掘紧耦合SoC-FPGA器件的潜力，构建了一套RISC-V软核与ARM硬核（SoC侧）之间的信息交互机制.通过共享内存和虚拟核间中断等方法，可使目标RISC-V处理器灵活使用平台丰富的I/O外设资源，并充分利用硬核ARM处理器算力协同运行复杂软件系统.此外，为提升软硬件系统级平台的敏捷性，构建了灵活可配置的云上自动化开发框架.通过对平台上目标RISC-V软核处理器各方面的分析评估，验证了该平台可有效缩短系统级测试的迭代周期，提升RISC-V处理器软硬件原型评测效率. 相似文献

4.

基于验证库的微处理器指令集验证方法

下载免费PDF全文

龚令侃王玉艳章建雄《计算机工程》2009,35(3):86-88

指令集作为微处理器软件和硬件的分界线在计算机体系结构中占有重要地位。测试程序自动生成（RTPG）是微处理器指令集验证的主要方法之一。该文比较目前主流的RTPG技术和验证策略,提出基于验证库的随机测试程序生成工具。使用通用脚本语言开发验证库和测试程序模板,针对不同验证阶段生成高质量的测试程序。测试结果表明,该方法实现简单,能达到较好的验证效果。相似文献

5.

尺子

《每周电脑报》1999,(7)

今日的微处理器和 PC 运行的程序和处理的数据越来越多,每片处理器在处理各种应用程序的时候性能都不可能完全相同。测试程序就是专门用来评估处理器和系统在运行整数、多媒体和浮点等应用程序时的性能,测试程序本身应该具备检查处理器或系统各项性能的能力。目前在 Internet 上的一些站点,已经发布了Pentium Ⅲ处理器的初步评测,结论是相对Pentium Ⅱ处理器没有太多的改善。英特尔公司的意见是:只有使用奔腾Ⅲ处理器、硬件的驱动程相似文献

6.

基于PowerPC的SoC软硬件协同验证平台

许珂桑胜田喻明艳《微处理机》2009,30(2)

构建了基于PowerPC405处理器的SoC软硬件协同验证平台.该平台使用层次化的设计方法,在统一平台架构下支持RTL和TLM两种不同抽象层次的虚拟原型仿真,兼顾了仿真精度和速度的要求.平台中提供了完整的开发工具和基础架构,支持以C语言测试程序作为输入的验证流程自动化,可有效地提高验证效率. 相似文献

7.

DOOC:一种能够有效消除抖动的软硬件合作管理Cache 总被引：3，自引：0，他引：3

吴俊杰杨学军曾坤张百达冯权友刘光辉唐玉华《计算机研究与发展》2008,45(12)

作为弥补处理器和主存之间速度巨大差异的桥梁,Cache已经成为现代处理器中不可或缺的一部分.经研究发现.传统Cache单独使用硬件进行管理,使用固定的Cache策略和一致性协议难以适应程序中数据访存模式的多样性,容易造成Cache抖动,以致影响性能,提出了一种新的软硬件合作管理Cache--面向数据对象Cache(data-obiect oriented cache,DOOC).DOOC动态地为程序中的数据对象分配Cache段,并且动态变化段容量、段内相联度、块大小和一致性协议,从而适应数据访存模式的多样性,还介绍了DOOC软件管理的编译方法以及面向数据对象的预取机制.分别使用CACTI和基于LEON3处理器的实验平台对DOOC的硬件开销进行评估.验证了DOOC的硬件可实现性,还使用软件模拟的方式分别测试了DOOC在单核和多核处理器平台上的性能.在单核处理器上对15个基准测试程序的评测结果表明.与传统Cache相比,DOOC失效率平均降低44.98%(最大降低93.02%),平均加速比为1.20(最大为2.36).同时.通过在4核处理器平台上运行NPB的OpenMP版本测试程序,失效率平均降低49.69%(最大降低73.99%). 相似文献

8.

Spec CPU2000基准程序运行路径分析

葛仁北《计算机工程》2007,33(7):38-40

Spec CPU2000的基准程序被广泛地应用于处理器的设计性能评估。在微处理器RTL级系统评估过程中，需要运行一些性能评估代码来完成系统的评估，因为运行整个基准程序的代价很大，所以都用运行部分代码的方式来获得性能情况。该文利用基准程序的频繁函数提取出微程序的方法，用于微处理器RTL级系统的评估，在函数提取过程中研究函数内部的频繁使用路径，研究了这些最频繁函数中的最频繁使用路径，了解基准程序的运行行为，为处理器的初级阶段的研究提供一些类基准程序，快速评估初级的处理器性能。相似文献

9.

面向专用指令集处理器设计的软硬件协同验证

下载免费PDF全文

严迎建杨志峰任方《计算机工程》2010,36(6):241-243

为提高专用指令集处理器设计中的验证效率和覆盖率,将专用指令集处理器的寄存器传输级设计验证与汇编器、指令集模拟器等软件开发工具的测试相结合,提出一种软硬件协同验证方法。该方法按照覆盖率要求由软件自动产生测试程序和数据,将利用汇编器产生的机器指令输入到指令集模拟器和硬件仿真工具分别进行软硬件仿真,通过软硬件仿真结果自动比对得出联合验证结果。实践证明,该方法能够有效提高验证效率和覆盖率,缩短验证周期。相似文献

10.

可重构分组密码处理结构模型研究与设计 总被引：2，自引：0，他引：2

杨晓辉戴紫彬张永福《计算机研究与发展》2009,46(6)

随着信息技术的发展和网络规模不断扩大,网络通信等应用对数据加解密处理提出了更高的要求,可重构计算是将可重构硬件处理单元和软件可编程处理器结合的计算系统.因此采用可重构计算技术来设计密码处理系统,使同一硬件能够高效灵活地支持密码应用领域内的多种算法.同时满足了密码处理对性能和灵活性的要求,提高了密码系统的安全性.论文在分析分组密码算法处理结构的基础上,结合了可重构结构的设计思想和方法,提出了一种可重构密码处理结构模型RCPA,并基于该模型实现了一款验证原型.原型在FPGA上成功进行了验证测试并在0.18μm CMOS工艺标准单元库下进行逻辑综合以及布局布线.实验结果表明,在RCPA验证原型上执行的分组密码算法都可达到较高的性能,其密码处理性能与通用高性能微处理器处理性能相比提高了10～20倍;与其他一些专用可重构密码处理结构处理性能相比提高了1.1～5.1倍.结果说明研究的RCPA模型既能保证分组密码算法应用的灵活性又能够达到较高的性能. 相似文献

11.

Precis: a usercentric word-length optimization tool

Chang M.L. Hauck S. 《Design & Test of Computers, IEEE》2005,22(4):349-361

Translating an algorithm designed for a general-purpose processor into an algorithm optimized for custom logic requires extensive knowledge of the algorithm and the target hardware. Precis lets designers analyze the precision requirements of algorithms specified in Matlab. The design time tool combines simulation, user input, and program analysis to help designers focus their manual precision optimization efforts. 相似文献

12.

A study of solution algorithms for shape design sensitivity analysis on a supermini computer with an attached array processor

Bernhard Dopker Kyung K. Choi 《Engineering with Computers》1987,3(2):111-119

This paper presents a study and comparison of shape design sensitivity analysis algorithms that are based on the continuum adjoint variable method, the continuum direct differentiation method, and the finite difference method, implemented on a supermini computer with an attached array processor. The basic algorithms and their differences in evaluating shape design sensitivity coefficients are outlined. A solution method for solving a system of equations, using a general sparse storage technique, is used for numerical implementation of shape design sensitivity analysis. It is found that computing shape design sensitivity coefficients using the direct differentiation method is significantly more efficient than using the adjoint variable method or the finite difference method. A detailed performance evaluation of the methods, using an attached array processor, is presented. The performance of the attached array processor, compared to a supermini computer is shown to depend strongly on the type of computations to be carried out. When only parts of a program are running on an attached array processor, the CPU time distribution among the different subroutines of the program can change significantly, compared to using the host processor only. 相似文献

13.

基于MIPS指令集的超标量和超长指令字混合架构处理器设计

李源马海林何虎《计算机应用研究》2016,33(6)

针对嵌入式和移动设备对处理器高性能低功耗日趋强烈的要求,提出一种基于MIPS指令集的顺序超标量和超长指令字混合架构处理器设计方案,便于以同构多核架构代替目前业界普遍采用的CPU与DSP异构结构,降低功耗面积,同时以VLIW模式获得较好的DSP性能。在PD（Processor Designer）平台下以LISA语言建立处理器的周期精度软件模拟器,通用性能和DSP性能分别由dhrystone、coremark基准测试程序及EEMBC的telecom测试程序进行验证。测试结果表明该设计以较低的硬件开销通过混合架构获得较高的数字信号处理性能,在高性能低功耗的处理器应用场景中拥有良好的适用性。相似文献

14.

Realtime digital signal processing system using a parallel processing architecture

PC Ching SW Wu 《Microprocessors and Microsystems》1989,13(10):653-658

A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example. 相似文献

15.

GPGPU computation and visualization of three-dimensional cellular automata

St��phane Gobron Arzu ??ltekin Herv�� Bonafos Daniel Thalmann 《The Visual computer》2011,27(1):67-81

This paper presents a general-purpose simulation approach integrating a set of technological developments and algorithmic methods in cellular automata (CA) domain. The approach provides a general-purpose computing on graphics processor units (GPGPU) implementation for computing and multiple rendering of any direct-neighbor three-dimensional (3D) CA. The major contributions of this paper are: the CA processing and the visualization of large 3D matrices computed in real time; the proposal of an original method to encode and transmit large CA functions to the graphics processor units in real time; and clarification of the notion of top-down and bottom-up approaches to CA that non-CA experts often confuse. Additionally a practical technique to simplify the finding of CA functions is implemented using a 3D symmetric configuration on an interactive user interface with simultaneous inside and surface visualizations. The interactive user interface allows for testing the system with different project ideas and serves as a test bed for performance evaluation. To illustrate the flexibility of the proposed method, visual outputs from diverse areas are demonstrated. Computational performance data are also provided to demonstrate the method’s efficiency. Results indicate that when large matrices are processed, computations using GPU are two to three hundred times faster than the identical algorithms using CPU. 相似文献

16.

嵌入式处理器中降低Cache缺失代价设计方法研究 总被引：2，自引：0，他引：2

黄海林许彤范东睿唐志敏《小型微型计算机系统》2006,27(11):2077-2081

以龙芯1号处理器为研究对象，探讨了嵌入式处理器中降低Cache缺失代价的设计方法．通过分析处理器的结构特征，本文实现了在关键字优先基础上一次缺失下命中的非阻塞数据Cache，可以将处理器平均性能提高3．9％,同时利用局部性原理，在关键字优先非阻塞数据Cache的基础上，本文提出了一种类非阻塞的指令Cache设计方法，可以降低指令Cache的缺失代价，以较小的实现代价进一步将处理器平均性能提高7．7％．通过本文的工作，可以同时降低指令Cache和数据Cache的缺失代价，处理器的平均性能提高了11．6％．相似文献

17.

The design and testing of a force feedback dental simulator 总被引：6，自引：0，他引：6

Thomas G Johnson L Dow S Stanford C 《Computer methods and programs in biomedicine》2001,64(1):53-64

The Iowa Dental Surgical Simulator is a haptic simulator to train dental students in the haptic skills of dentistry. The initial design emphasizes the detection of carious lesions. This work describes the software and implementation of the prototype system, the design tradeoffs' and the technical issues associated with haptic and graphics subsystems. The work also describes the current system performance, including a formal evaluation by practicing dentists and performance measures. A discussion of the limitations of the current system is followed by an analysis of opportunities to improve the quality of the simulator. The results should be of interest to designers of medical haptic simulation systems and other simulation designers. 相似文献

18.

Real-time image processing on a custom computing platform

Athanas P.M. Abbott A.L. 《Computer》1995,28(2):16-25

The authors explore the utility of custom computing machinery for accelerating the development, testing, and prototyping of a diverse set of image processing applications. We chose an experimental custom computing platform called Splash-2 to investigate this approach to prototyping real time image processing designs. Custom computing platforms are emerging as a class of computers that can provide near application specific computational performance. We developed a real time image processing system called VTSplash, based on the Splash-2 general-purpose platform. Splash-2 is an attached processor featuring programmable processing elements (PEs) and communication paths. The Splash-2 system uses arrays of RAM based field programmable gate arrays (FPGAs), crossbar networks, and distributed memory to accomplish the needed flexibility and performance tasks. Such platforms let designers customize specific operations for function and size, and data paths for individual applications 相似文献

19.

通用电路板自动测试系统的软件结构及实现方法 总被引：1，自引：1，他引：0

杜舒明《计算机测量与控制》2008,16(8):1192-1194

自动测试系统(ATS)的发展趋势是通用化,通用化的关键是系统设计采用相关测试标准和开放式的软件结构;针对电路板测试软件通用化的功能需求,提出了测试程序集(TPS)的通用开发环境和执行环境的软件结构和实现方法,介绍了基于依赖性模型的故障诊断方法和测试方法库的概念;该软件的主要特点:1)采用开放式软件结构;2)通用测试方法库的使用和和简洁的测试树开发界面提高了TPS的开发效率;3)多媒体信息查询使故障检测和故障隔离更容易;目前,该通用ATS软件已成功应用于多个自动测试系统,不仅显著缩短了系统的开发时间,而且减少了软件的开发费用。相似文献

20.

Accelerating next-generation public-key cryptosystems on general-purpose CPUs

Eberle H. Shantz S. Gupta V. Gura N. Rarick L. Spracklen L. 《Micro, IEEE》2005,25(2):52-59

This article describes low-cost techniques for accelerating the ECC and RSA public-key cryptosystems on general-purpose processor architectures. We focus on hardware acceleration of public-key cryptosystems on 64-bit server machines. A prototype based on a Sparc CPU data path shows a clear performance advantage of ECC over RSA. 相似文献