首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
一种新型CISC微处理器指令译码设计方法   总被引:3,自引:0,他引:3  
文章在介绍Gray码和独热编码设计CISC微处理器指令译码单元的基础上,提出了一种全新的指令译码状念机设计方案——状态分拆方法,该方法可提高指令译码状态转换速度。对几种设计方法进行了横向比较。  相似文献   

2.
一种高性能32位移位寄存器单元的设计   总被引:2,自引:0,他引:2  
介绍了一种用于32位微处理器中执行单元的双总线(64位输入32位输出)移位寄存器单元的设计。讨论了矩阵移位器和树状移位器结构,提出了基于两者结合的Matrix-Tree 结构并给出了其硬件电路的实现。为了能实现X86指令集全部移位类指令,采用了指令预处理的技术,节省了指令周期,提高了CPU的效率。  相似文献   

3.
文章在研究分析DAISY和Cruseo(tm)这两款处理器后,针对X86指令集系统提出一种全硬件的动态翻译模型.该模型用RISC内核实现X86操作,指令翻译和转换完全用硬件实现.对于X86指令长度不定,取指部件效率不高,该模型使用多队列取指:RISC内核的执行采用路径预测技术.它的优点是在兼容的基础上尽可能地提高处理器性能。  相似文献   

4.
文章介绍了与ARM7TDMI指令集兼容的嵌入式微处理器NPUARM7控制器设计.提出了接指令分类的状态机结构,以及由状态机控制下的分步译码的控制信号产生机制。该设计简化了译码控制逻辑.提高了译码效率,消除了流水线复杂译码逻辑容易产生的瓶颈问题,适应高速度、低功耗嵌入式应用要求。  相似文献   

5.
文章介绍了一种采用多时钟定量系统设计八位复杂指令集微处理器的方法。复杂指令的分解与技巧,指令与步长计数器联合译码算法,子操作步骤的多时钟实现方法给予了详细的说明;多时钟实现方法在项目管理与其它类别微处理器系统级设计中的运用给予了拓展性描述。  相似文献   

6.
一种采用3级指令流水线的51内核设计   总被引:1,自引:1,他引:0  
流水线技术是提高系统带宽的一项强大的实现技术,并且不需要大量附加的硬件设置.在微处理器设计中采用流水线技术是提高微处理器性能的一种很有效的方法.本文主要介绍了自行设计的一种采用3级指令流水线的51内核的设计和实现.内容包括:3级指令流水线的划分以及相应的系统结构框架,51指令集中各种类型指令的执行情况,间接寻址功能的实现方法,流水线数据相关问题的解决方案,最后讨论设计的FPGA实现.  相似文献   

7.
32位浮点嵌入式MCU设计研究   总被引:3,自引:2,他引:1  
本文介绍了一个基于RISC体系结构的32位浮点嵌入式MCU的设计实现。该:MCU内含128kbit的SRAM、采用哈佛结构、四级指令流水线、32位指令字长和内部43位数据字长。MCU内部设置多个快速寄存器及采用硬连线逻辑代替微程序控制的方法,加快了微处理器的速度,提高了指令执行效率。设计中还采用对寄存器同步写、异步读的方式避免了数据相关问题。  相似文献   

8.
提出了一种基于分布式控制方式的动态指令调度算法,该算法能够有效提高指令发射效率,降低指令分派单元逻辑复杂度,提高系统主频.该指令发射算法在自主设计的"龙腾R3" RISC"三发射"超标量微处理器中进行应用实现,达到了设计预期目标.  相似文献   

9.
8位高速RISC微处理器的设计   总被引:1,自引:0,他引:1  
本文按照自上而下的系统级设计思想,进行系统功能结构的划分。利用VerilogHDL进行寄存器传输级的描述,完成了与其他同类产品兼容的,具有取指、译码、执行和回写四级流水线,一条指令只用一个时钟周期(个别跳转指令例外)的RISC微处理器IP软核的设计。并通过版图设计的考虑,探讨了提高所设计微处理器的时钟速度的方法。  相似文献   

10.
基于FPGA的AES密码协处理器的设计和实现   总被引:3,自引:1,他引:2  
文章基于FPGA设计了一种能完成AES算法加密的密码协处理器,设计中利用VirtexⅡ系列FPGA的结构特点,对AES算法的实现做了优化。实验证明,这种实现方式用较少的电路资源达到了较高的数据吞吐率。该密码协处理器还提供了和ARM处理器的接口逻辑,实现了用于加/解密和数据输入输出的协处理器指令.作为ARM微处理器指令集的扩展,大大提高了嵌入式系统处理数据加/解的效率,实现数据的安全传输。  相似文献   

11.
A versatile time-domain Reed-Solomon decoder   总被引:2,自引:0,他引:2  
A versatile Reed-Solomon (RS) decoder structure based on the time-domain decoding algorithm (transform decoding without transforms) is developed. The algorithm is restructured, and a method is given to decode any RS code generated by any generator polynomial. The main advantage of the decoder structure is its versatility, that is, it can be programmed to decode any Reed-Solomon code defined in Galois field (GF) 2m with a fixed symbol size m. This decoder can correct errors and erasures for any RS code, including shortened and singly extended codes. It is shown that the decoder has a very simple structure and can be used to design high-speed single-chip VLSI decoders. As an example, a gate-array-based programmable RS decoder is implemented on a single chip. This decoder chip can decode any RS code defined in GF (25) with any code word length and any number of information symbols. The decoder chip is fabricated using low-power 1.5-μ, two-layer-metal, HCMOS technology  相似文献   

12.
The instruction decoding unit (IDU), which is one of the two components used to implement a 32-bit VLSI, object-oriented general data processor is described. The instruction decoder is particularly novel in its ability to decode variable length, bit-aligned instructions at high speed. A brief discussion is given on both the organization of the variable length instructions and the microarchitecture of the general data processor. Some of the extensions made to classic state machine concepts are presented, along with a discussion of the circuits used to implement these extensions. Finally, the timing requirements and their associated circuit constraints are discussed.  相似文献   

13.
为了提高CAVLC解码器的解码速率,提出了一种优化的CAVLC解码器结构,主要包括level解码模块和RunBefore解码模块。level解码模块采用伪并行的结构解码幅值,实现了半个周期解码一个幅值;采用RunBefore与level快速合并的方法,在RunBefore解码完成的同时形成残差系数。建立了该优化结构的RTL模型,并验证了其功能的正确性。利用Xilinx公司的ISE13.3对该设计进行综合,结果显示该设计可以支持1 080 p高清视频的实时解码。  相似文献   

14.
5G LDPC码译码器实现   总被引:1,自引:0,他引:1  
该文介绍了5G标准中LDPC码的特点,比较分析了各种译码算法的性能,提出了译码器实现的总体架构:将译码器分为高速译码器和低信噪比译码器。高速译码器适用于码率高、吞吐率要求高的情形,为译码器的主体;低信噪比译码器主要针对低码率、低信噪比下的高性能译码,处理一些极限情形下的通信,对吞吐率要求不高。分别对高速译码器和低信噪比译码器进行了设计实践,给出了FPGA综合结果和吞吐率分析结果。  相似文献   

15.
Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput. Several techniques are used to improve the performance of the decoder. To fully utilize the computational resources on GPU, our decoder can decode multiple codewords simultaneously, divide the workload for a single codeword across multiple cores, and pack multiple codewords to fit the single instruction multiple data (SIMD) instruction width. In addition, we use shared memory judiciously to enable hundreds of concurrent multiple threads while keeping frequently used data local to keep memory access fast. To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics. Finally, we examine how different workload partitioning choices affect the error correction performance and the decoder throughput.  相似文献   

16.
该文介绍了5G标准中LDPC码的特点,比较分析了各种译码算法的性能,提出了译码器实现的总体架构:将译码器分为高速译码器和低信噪比译码器。高速译码器适用于码率高、吞吐率要求高的情形,为译码器的主体;低信噪比译码器主要针对低码率、低信噪比下的高性能译码,处理一些极限情形下的通信,对吞吐率要求不高。分别对高速译码器和低信噪比译码器进行了设计实践,给出了FPGA综合结果和吞吐率分析结果。  相似文献   

17.
We present a scheme for real-time digital HDTV video decoding suitable for DVB or ATSC set-top boxes. Our technique is based on a dual decoding datapath controlled in two fixed-scheduling combinations with an efficient memory interface scheme for anchor pictures. Unlike other decoding approaches such as the slice bar decoding method and the crossing-divided method, our scheme reduces memory access contention problem to achieve real-time HDTV decoding without a high cost in overall decoder buffers, architecture, and bus. Our simulation shows that with a relatively low rate 81 MHz clock, our decoder can decode MPEG-2 MP@HL HDTV in real-time, based on a video format of 1920 /spl times/ 1080 pixels/frame at 30 frames/s, at a bit rate of 18-22 Mbps.  相似文献   

18.
A neural network (NN)-based decoding algorithm of block Markov superposition transmission (BMST) was researched.The decoders of the basic code with different network structures and representations of training data were implemented using NN.Integrating the NN-based decoder of the basic code in an iterative manner,a sliding window decoding algorithm was presented.To analyze the bit error rate (BER) performance,the genie-aided (GA) lower bounds were presented.The NN-based decoding algorithm of the BMST provides a possible way to apply NN to decode long codes.That means the part of the conventional decoder could be replaced by the NN.Numerical results show that the NN-based decoder of basic code can achieve the BER performance of the maximum likelihood (ML) decoder.For the BMST codes,BER performance of the NN-based decoding algorithm matches well with the GA lower bound and exhibits an extra coding gain.  相似文献   

19.
A lower bound to the distribution of computation for sequential decoding   总被引:1,自引:0,他引:1  
In sequential decoding, the number of computations which the decoder must perform to decode the received digits is a random variable. In this paper, we derive a Paretian lower bound to the distribution of this random variable. We show thatP [C > L]L^{-rho}, whereCis the number of computations which the sequential decoder must perform to decode a block ofLambdatransmitted bits, and is a parameter which depends on the channel and the rate of the code. Our bound is valid for all sequential decoding schemes and all discrete memoryless channels. In Section II we give an example of a special channel for which a Paretian bound can be easily derived. In Sections III and IV we treat the general channel. In Section V we relate this bound to the memory buffer requirements of real-time sequential decoders. In Section VI, we show that this bound implies that certain moments of the distribution of the computation per digit are infinite, and we determine lower bounds to the rates above which these moments diverge. In most cases, our bounds coincide with previously known upper bounds to rates above which the moments converge. We conclude that the performance of systems using sequential decoding is limited by the computational and buffer capabilities of the decoder, not by the probability of making a decoding error. We further note that our bound applies only to sequential decoding, and that, in certain special cases (Section II), algebraic decoding methods prove superior.  相似文献   

20.
A universal lattice code decoder for fading channels   总被引:40,自引:0,他引:40  
We present a maximum-likelihood decoding algorithm for an arbitrary lattice code when used over an independent fading channel with perfect channel state information at the receiver. The decoder is based on a bounded distance search among the lattice points falling inside a sphere centered at the received point. By judicious choice of the decoding radius we show that this decoder can be practically used to decode lattice codes of dimension up to 32 in a fading environment  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号