首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
为解决TEC-XP16教学机缺少浮点乘法指令的问题,基于TEC-XP16教学机,提出微程序控制器中32位浮点乘法指令的一种快速的设计方法.为解决人工方式将每条汇编指令转换成一条或多条微指令速度慢且容易出错等问题,提出一种能够根据汇编程序自动生成微程序的方法.为解决手工修改控制器ABEL语言源程序速度慢及容易出错等问题,提出一种能够自动修改控制器的ABEL语言源程序的方法.实验结果表明,所设计的32位浮点乘法指令的功能是正确的,平均只需要1.9 s就能根据汇编程序表自动生成微程序表,平均只需0.7 s就能根据微程序表等自动修改并生成控制器ABEL语言源程序,极大提高了浮点乘法指令的设计速度.该方法也可推广到其他复杂指令的设计上.  相似文献   

2.
本文提出的GMAS通用微程序汇编系统能够适用于范围较为广泛的水平型微程序控制的计算机或者其它数字系统的微程序的描述,自动生成目标微码,并进行正确性的检验。它由GMAL微汇编语言,微汇编部分和模拟部分组成。本文着重对GMAL语言及其微汇编部分作重点介绍。  相似文献   

3.
在控制器中设计新指令是控制器教学中比较难的内容。为解决TEC-XP16教学机缺少乘法指令和除法指令的问题,基于TEC-XP16教学机,介绍微程序控制器的基本组成与工作原理,分析在微程序控制器中扩展指令的主要过程,提出一种微程序控制器中MUL乘法指令与DIV除法指令的设计方案;以扩展MUL乘法指令为例,详细介绍在TEC-XP16教学机微程序控制器中设计实现扩展指令的完整过程;以实验方式验证了所设计和实现的乘法指令与除法指令的功能。所提出的设计方法对微程序控制器部件的教学与实验有一定的指导作用。  相似文献   

4.
动态二进制翻译中,在目标平台没有浮点部件、不支持浮点运算的情况下,浮点指令只能通过模拟解释执行。浮点指令的解释执行造成翻译系统效率急剧下降。通过将浮点指令运算转化为定点运算,解决了浮点指令在目标平台的翻译,为浮点指令的翻译找到了新的途径。在动态二进制翻译系统中进行实验,验证了翻译方法的可行性。实验显示翻译系统的性能有明显提升,含有浮点指令的比例越高,算法能够获得的加速比越高,对含有25%浮点指令的程序,该算法能达到1.55的加速比。  相似文献   

5.
引言 DSP结构可以分为定点和浮点型两种.其中,定点型DSP可以实现整数、小数和特定的指数运算,它具有运算速度快、占用资源少、成本低等特点;灵活地使用定点型DSP进行浮点运算能够提高运算的效率.目前对定点DSP结构支持下的浮点需求也在不断增长,主要原因是: 实现算法的代码往往是采用C/C 编写,如果其中有标准型的浮点数据处理,又必须采用定点DSP器件,那么就需要将浮点算法转换成定点格式进行运算.同时,定点DSP结构下的浮点运算有很强的可行性,因为C语言和汇编语言分别具有可移植性强和运算效率高的特点,因此在定点DSP中结合C语言和汇编语言的混合编程技术将大大提高编程的灵活度,以及运算速度.  相似文献   

6.
本文简要地介绍了微程序自动生成系统的基本组成.该系统以不依赖机器的微程序设计高级语言MAGL描述的源程序为输入,经过编译系统的词法分析、语法分析、微指令生成、微地址分配,产生目标微程序.本文还对该系统的应用和系统的改进作了简要说明.  相似文献   

7.
针对神经网络训练加速器中存在权重梯度计算效率低的问题,设计了一种高性能卷积神经网络(CNN)训练处理器的浮点运算优化架构。在分析CNN训练架构基本原理的基础上, 提出了包括32bit、24bit、16bit和混合精度的训练优化架构,从而找到适用于低能耗且更小尺寸边缘设备的最佳浮点格式。通过现场可编程门阵列(FPGA)验证了加速器引擎可用于MNIST手写数字数据集的推理和训练,利用24bit自定义浮点格式与16bit脑浮点格式相结合构成混合卷积24bit浮点格式的准确率可达到93%以上。运用台积电55nm芯片实现优化混合精度加速器,训练每幅图像的能耗为8.51μJ。  相似文献   

8.
基于面向对象的程序设计思想,对计算机中的微程序控制器进行了分析和设计。利用VC++语言模拟实现了计算机组成原理实验箱(Tec-xp)中的微程序控制器实验,验证了实验箱(Tec-xp)各种指令的功能,给学生提供一个具有自主性的实验环境。  相似文献   

9.
高级语言与汇编语言混合编程是程序设计的重要手段之一。本文旨在解决在TurboBASIC调用汇编语言子程序时,INLINE汇编语言过程文件的自动转换生成问题。  相似文献   

10.
模型机设计实验是计算机组成原理实验中的一个综合性较强的实验,要求学生在掌握各部件单元电路的基础上,构建一台模型计算机。文章选用TD-CMA实验教学系统,针对采用微程序控制器设计的CPU与简单模型机设计实验,从连线排查、指令设计、微程序设计、指令控制、程序运行等方面分析该实验中的关键问题,并给出每个问题的解决方法。  相似文献   

11.
New Products     
Michalopoulos  D.A. 《Computer》1975,8(11):85-89
Three Rivers Computer Corporation now offers PDP-11/40 and 11/35 owners the ability to microprogram their computers. This two-board option plugs into the EIS (extended instruction set) and FIS (floating point instruction set) slots of the processor. It contains 1,204 words of 80-bit read/write control memory implemented with 50 ns. TTL RAM.  相似文献   

12.
In this note we study scaling rules and roundoff noise variances in a fixed-point implementation of the Kalman predictor for an ARMA time series observed noise free. The Kalman predictor is realized in a fast form that uses the so-called fast Kalman gain algorithm. The algorithm for the gain is fixed point. Scaling rules and expressions for rounding error variances are derived. The numerical results show that the fixed-point realization performs very close to the floating point realization for relatively low-order ARMA time series that are not too narrow band. The predictor has been implemented in 16-bit fixed-point arithmetic on an INTEL 8086 microprocessor, and in 16-bit floating-point arithmetic on an INTEL 8080. Fixed-point code was written in Assembly language and floating-point code was written in Fortran. Experimental results were obtained by running the fixed- and floating-point filters on identical data sets. All experiments were carried out on an INTEL MIDS 230 development system.  相似文献   

13.
王云贵  杨靓 《微处理机》2012,33(1):7-11
浮点单元的验证是最具挑战性的任务之一。基于Xilinx FX系列带powerpc 405硬核的FPGA,利用嵌入式系统开发套件EDK,设计了一个嵌入式系统对浮点单元进行验证。验证原理为把用户IP(被测浮点单元)通过APU控制器连接到powerpc 405处理器核,编写测试程序,通过自定义指令对用户IP进行访问,根据程序的运算结果判断被测IP的正确性。  相似文献   

14.
In this article, the basic principles of designing a floating point processor made up of the bit-slice Am2903, which can be integrated into a 16-bit central processor as a subsystem, are discussed. Floating point processor microprogramming and microflows of an instruction are also discussed  相似文献   

15.
本文介绍32位微处理器芯片的行为设计和行为模拟。按照自顶向下的设计原则,在芯片的上层设计阶段,采用了标准的硬件描述语言VHDL建立了该芯片的行为模型,并以此为基础建立了该芯片的行为级虚拟机环境。通过对该虚拟机的调试,完成了该芯片内部组织结构和微程序的正确性验证。  相似文献   

16.
In this paper we present a system for automatic terminology extraction and automatic detection of the equivalent terms in the target language to be used alongside a computer assisted translation (CAT) tool that provides term candidates and their translations in an automatic way each time the translator goes from one segment to the next one. The system uses several sources of information: the text from the segment being translated and from the whole translation project, the translation memories assigned to the project and a translation phrase table from a statistical machine translation system. It also uses the terminological database assigned to the project in order to avoid presenting already known terms. The use of translation phrase tables allows us to use very large parallel corpora in a very efficient way. We have used Moses to calculate and to consult the translation phrase tables. The program is written in Python and it can be used with any CAT tool. In our experiments we have used OmegaT, a well-known open source CAT tool. Evaluation results for English–Spanish and for three subjects (politics, finance, and medicine) are presented.  相似文献   

17.
针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE-754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40 nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84902.2618 μm2,运行频率可达2.2 GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。  相似文献   

18.
In this paper we describe SCDBR, a system that is able to reason automatically from specifications of database updates written in the situation calculus, a first–order language originally proposed by John McCarthy for reasoning about actions and change. The specifications handledby the system are written in the formalism proposed by Ray Reiter for solving the frame problem that appears when one expresses the effects on the database predicates of the execution of atomic transactions.SCDBR is written in PROLOG, and can solve several reasoning tasks, among others,it is able to derive the final specification from effect axioms, to answerqueries to virtually updated databases, to check legality of transactions,to prove integrity constraints from the specification, to modify thespecification in order to embed a desired integrity constraint, and to answer historical queries. For some of these tasks SCDBR can call othersystems, like relational database systems, automated theorem provers, andconstraint solvers.  相似文献   

19.
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.

Program summary

Program title: ITER-REFCatalogue identifier: AECO_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 7211No. of bytes in distributed program, including test data, etc.: 41 862Distribution format: tar.gzProgramming language: FORTRAN 77Computer: desktop, serverOperating system: Unix/LinuxRAM: 512 MbytesClassification: 4.8External routines: BLAS (optional)Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution.Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision.Running time: seconds/minutes  相似文献   

20.
We present a rigorous mathematical proof of the correctness of the floating point square root instruction of the AMD K5 microprocessor. The instruction is represented as a program in a formal language that was designed for this purpose, based on the K5 microcode and the architecture of its FPU. We prove a statement of its correctness that corresponds directly with the IEEE Standard. We also derive an equivalent formulation, expressed in terms of rational arithmetic, which has been encoded as a formula in the ACL2 logic and mechanically verified with the ACL2 prover. Finally, we describe a microcode modification that was implemented as a result of this analysis in order to ensure the correctness of the instruction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号