期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

800Mbps准循环LDPC码译码器的FPGA实现 总被引：1，自引：0，他引：1

张仲明许拔杨军张尔扬《信号处理》2010,26(2):255-261

本文提出了一种适用于准循环低密度校验码的低复杂度的高并行度译码器架构。通常准循环低密度校验码不适于设计有效的高并行度高吞吐量译码器。我们通过利用准循环低密度校验码的奇偶校验矩阵的结构特点,将其转化为块准循环结构,从而能够并行化处理译码算法的行与列操作。使用这个架构,我们在Xilinx Virtex-5 LX330 FPGA上实现了(8176,7154)有限几何LDPC码的译码器,在15次迭代的条件下其译码吞吐量达到800Mbps。相似文献

2.

Min-Sum Decoder Architectures With Reduced Word Length for LDPC Codes

《IEEE transactions on circuits and systems. I, Regular papers》2010,57(1):105-115

In this paper, we propose an improvement of the normalized min-sum (MS) decoding algorithm and novel MS decoder architectures with reduced word length using nonuniform quantization schemes for low-density parity-check (LDPC) codes. The proposed normalized MS algorithm introduces a more exact adjustment with two optimized correction factors for check-node-updating computations, while the conventional normalized MS algorithm applies only one correction factor. The proposed algorithm provides a significant performance gain without any additional computation or hardware complexity. The finite word-length analysis in implementing an LDPC decoder is a very important factor since it directly impacts the size of memory to store the intrinsic and extrinsic messages and the overall hardware area in the partially parallel LDPC decoder. The proposed nonuniform quantization scheme can reduce the finite word length while achieving similar performances compared to a conventional quantization scheme. From simulation results, it is shown that the proposed 4-bit nonuniform quantization scheme achieves an acceptable decoding performance, unlike the conventional 4-bit uniform quantization scheme. Finally, the proposed MS decoder architectures by the nonuniform quantization scheme provide significant reductions of 20% and up to 8% for the memory area and combinational logic area, respectively, compared to the conventional 5-bit ones. 相似文献

3.

Optimal Overlapped Message Passing Decoding of Quasi-Cyclic LDPC Codes

Yongmei Dai Zhiyuan Yan Ning Chen 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(5):565-578

Efficient hardware implementation of low-density parity-check (LDPC) codes is of great interest since LDPC codes are being considered for a wide range of applications. Recently, overlapped message passing (OMP) decoding has been proposed to improve the throughput and hardware utilization efficiency (HUE) of decoder architectures for LDPC codes. In this paper, we first study the scheduling for the OMP decoding of LDPC codes, and show that maximizing the throughput gain amounts to minimizing the intra- and inter-iteration waiting times. We then focus on the OMP decoding of quasi-cyclic (QC) LDPC codes. We propose a partly parallel OMP decoder architecture and implement it using FPGA. For any QC LDPC code, our OMP decoder achieves the maximum throughput gain and HUE due to overlapping, hence has higher throughput and HUE than previously proposed OMP decoders while maintaining the same hardware requirements. We also show that the maximum throughput gain and HUE achieved by our OMP decoder are ultimately determined by the given code. Thus, we propose a coset-based construction method, which results in QC LDPC codes that allow our optimal OMP decoder to achieve higher throughput and HUE. 相似文献

4.

FPGA implementation of low complexity LDPC iterative decoder

Shivani Verma Sanjay Sharma 《International Journal of Electronics》2016,103(7):1112-1126

Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family. 相似文献

5.

标准低密度奇偶校验码译码算法中量化结构

下载免费PDF全文

兰亚柱杨海钢林郁《太赫兹科学与电子信息学报》2015,13(4):584-589

DVB-S2标准低密度奇偶校验码(LDPC)译码器在深空通信中面临着低复杂度、高灵活性及普适性方面的迫切需求。通过对LDPC译码算法中量化结构的研究,提出一种动态自适应量化结构的设计方法。该方法在常规均匀硬件量化的基础上,提出了修正化Min-Sum译码算法中的数据信息初始化及迭代译码的动态自适应量化结构,解决了DVB-S2标准LDPC码译码时存在的校验节点运算与变量节点运算之间的复杂度不平衡的问题,并由此提高了译码器的译码性能。实验证明,以DVB-S2标准LDPC码中码长为16 200,码率为1/2的为例,提供动态自适应量化结构与常规的均匀量化结构相比,节省硬件资源为4%。此外,动态自适应量化结构支持动态可配置功能,保证了DVB-S2标准LDPC译码器的灵活性及普适性。相似文献

6.

一种低复杂度的多元LDPC译码算法

黎相成陈海强覃团发孙友明万海斌《电讯技术》2017,57(3)

针对多元低密度奇偶校验(LDPC)码译码复杂度高、时延大等问题,提出了一种基于硬信息的低复杂度多元LDPC译码算法.来自信道的接收信号在初始化时,先进行非均匀量化预处理.在迭代过程中,校验节点端只需传输单个比特的二进制硬可靠度信息至变量节点.在变量节点端,可靠度信息按比特位进行简单的累加和更新,无需任何的系数修正操作.同时,变量节点使用了全信息的方式将信息传输至与其相邻的校验节点.仿真结果显示,与基于比特可靠度(BRB)的多元LDPC译码算法相比,提出的算法在较低量化比特情况下,能获得约0.3 dB的译码性能增益,且译码复杂度更低. 相似文献

7.

A Memory Efficient Partially Parallel Decoder Architecture for Quasi-Cyclic LDPC Codes

Wang Z. Cui Z. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2007,15(4):483-488

This paper presents a memory efficient partially parallel decoder architecture suited for high rate quasi-cyclic low-density parity-check (QC-LDPC) codes using (modified) min-sum algorithm for decoding. In general, over 30% of memory can be saved over conventional partially parallel decoder architectures. Efficient techniques have been developed to reduce the computation delay of the node processing units and to minimize hardware overhead for parallel processing. The proposed decoder architecture can linearly increase the decoding throughput with a small percentage of extra hardware. Consequently, it facilitates the applications of LDPC codes in area/power sensitive high-speed communication systems 相似文献

8.

Implementation aspects of LDPC convolutional codes

Pusane A.E. Feltstrom A.J. Sridharan A. Lentmaier M. Zigangirov K. Costello D.J. 《Communications, IEEE Transactions on》2008,56(7):1060-1069

Potentially large storage requirements and long initial decoding delays are two practical issues related to the decoding of low-density parity-check (LDPC) convolutional codes using a continuous pipeline decoder architecture. In this paper, we propose several reduced complexity decoding strategies to lessen the storage requirements and the initial decoding delay without significant loss in performance. We also provide bit error rate comparisons of LDPC block and LDPC convolutional codes under equal processor (hardware) complexity and equal decoding delay assumptions. A partial syndrome encoder realization for LDPC convolutional codes is also proposed and analyzed. We construct terminated LDPC convolutional codes that are suitable for block transmission over a wide range of frame lengths. Simulation results show that, for terminated LDPC convolutional codes of sufficiently large memory, performance can be improved by increasing the density of the syndrome former matrix. 相似文献

9.

Variable non-uniform quantized belief propagation algorithm for LDPC decoding 总被引：1，自引：0，他引：1

Binbin Liu Dong Bai Shunliang Mei 《电子科学学刊(英文版)》2008,25(4):539-543

Non-uniform quantization for messages in Low-Density Parity-Check（LDPC）decoding can reduce implementation complexity and mitigate performance loss.But the distribution of messages varies in the iterative decoding.This letter proposes a variable non-uniform quantized Belief Propagation（BP）algorithm.The BP decoding is analyzed by density evolution with Gaussian approximation.Since the probability density of messages can be well approximated by Gaussian distribution,by the unbiased estimation of variance,the distribution of messages can be tracked during the iteration.Thus the non-uniform quantization scheme can be optimized to minimize the distortion.Simulation results show that the variable non-uniform quantization scheme can achieve better error rate performance and faster decoding convergence than the conventional non-uniform quantization and uniform quantization schemes. 相似文献

10.

一种准循环LDPC解码器的设计与实现 总被引：5，自引：5，他引：0

李刚黑勇仇玉林《微电子学与计算机》2008,25(7)

面向准循环LDPC码的硬件实现,定点分析了各种解码算法的解码性能,偏移量最小和(OMS)算法具备较高解码性能和实现复杂度低的特点.提出一种基于部分并行方式的准循环LDPC解码器结构,在FPGA上利用该结构成功实现了WiMAX标准中的LDPC解码器.FPGA验证结果表明,采用该结构的解码器性能优良,实现复杂度低,数据吞吐率高. 相似文献

11.

基于Min-Sum近似算法的QC-LDPC译码器

刘斌彬白栋梅顺良《无线通信技术》2008,17(1):1-6

由于BP算法中的非线性运算较复杂,实现中通常采用Min-Sum近似简化译码算法.针对译码过程中需要存储大量信息的问题,本文提出了一种基于Min-Sum近似算法的QC-LDPC译码器.通过重新安排Min-Sum近似算法中的运算,并将校验节点信息以一种压缩冗余的形式表示,大大减少了译码器所需的存储空间.针对QC-LDPC码校验矩阵准循环的特性,译码过程中以块为单位对信息进行更新,且可以实现多种消息传递调度策略.为进一步减少存储空间,对变量节点信息采用了非线性量化,根据密度演进理论对量化规则进行了优化. 相似文献

12.

Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder

Yang Sun Joseph R. CavallaroAuthor vitae 《Integration, the VLSI Journal》2011,44(4):305-315

We present an efficient VLSI architecture for 3GPP LTE/LTE-Advance Turbo decoder by utilizing the algebraic-geometric properties of the quadratic permutation polynomial (QPP) interleaver. The high-throughput 3GPP LTE/LTE-Advance Turbo codes require a highly-parallel decoder architecture. Turbo interleaver is known to be the main obstacle to the decoder parallelism due to the collisions it introduces in accesses to memory. The QPP interleaver solves the memory contention issues when several MAP decoders are used in parallel to improve Turbo decoding throughput. In this paper, we propose a low-complexity QPP interleaving address generator and a multi-bank memory architecture to enable parallel Turbo decoding. Design trade-offs in terms of area and throughput efficiency are explored to find the optimal architecture. The proposed parallel Turbo decoder has been synthesized, placed and routed in a 65-nm CMOS technology with a core area of 8.3 mm² and a maximum clock frequency of 400 MHz. This parallel decoder, comprising 64 MAP decoder cores, can achieve a maximum decoding throughput of 1.28 Gbps at 6 iterations 相似文献

13.

High-throughput LDPC decoders

Mansour M.M. Shanbhag N.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):976-996

A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. 相似文献

14.

一种低译码复杂度的Turbo架构LDPC码

熊磊谈振辉姚冬苹《电子与信息学报》2007,29(12):2907-2911

针对低密度奇偶校验(LDPC)码较大的译码复杂度和RAM占用,该文提出了一种低译码复杂度的Turbo架构LDPC码并行交织级联Gallager码 (Parallel Interleaved Concatenated Gallager Code,PICGC)。该文给出了PICGC的设计方法和编译码算法,并分析比较了PICGC译码器与LDPC译码器所需的RAM存储量,推导出RAM节省比的上界。理论分析和仿真结果表明,PICGC以纠错性能略微降低为代价,有效地降低译码复杂度和RAM存储量,且译码时延并未增加,是一种有效且易于实现的信道编码方案。相似文献

15.

Configurable Multi-Rate Decoder Architecture for QC-LDPC Codes Based Broadband Broadcasting System

Luoming Zhang Lin Gui Youyun Xu Wenjun Zhang 《Broadcasting, IEEE Transactions on》2008,54(2):226-235

In this paper we present a Base-matrix based decoder architecture for multi-rate QC-LDPC codes proposed in broadband broadcasting system. We use the Modified Min-Sum Algorithm (MMSA) as the decoding algorithm in this architecture, which lowers the complexity of the LDPC decoder while keeping almost the same performance or even better. Based on this algorithm, we designed a novel check node processing unit to reduce the complexity of the decoder and facilitate the multiplex of the processing units. The decoder designed with hardware constraints is not only scalable in throughput, but also easily configurable to support different QC-LDPC codes flexible in code rate and code length. 相似文献

16.

Efficient GPU and CPU-based LDPC decoders for long codewords

Stefan Gr?nroos Kristian Nybom Jerker Bj?rkqvist 《Analog Integrated Circuits and Signal Processing》2012,73(2):583-595

The next generation DVB-T2, DVB-S2, and DVB-C2 standards for digital television broadcasting specify the use of low-density parity-check (LDPC) codes with codeword lengths of up to 64800 bits. The real-time decoding of these codes on general purpose computing hardware is useful for completely software defined receivers, as well as for testing and simulation purposes. Modern graphics processing units (GPUs) are capable of massively parallel computation, and can in some cases, given carefully designed algorithms, outperform general purpose CPUs (central processing units) by an order of magnitude or more. The main problem in decoding LDPC codes on GPU hardware is that LDPC decoding generates irregular memory accesses, which tend to carry heavy performance penalties (in terms of efficiency) on GPUs. Memory accesses can be efficiently parallelized by decoding several codewords in parallel, as well as by using appropriate data structures. In this article we present the algorithms and data structures used to make log-domain decoding of the long LDPC codes specified by the DVB-T2 standard??at the high data rates required for television broadcasting??possible on a modern GPU. Furthermore, we also describe a similar decoder implemented on a general purpose CPU, and show that high performance LDPC decoders are also possible on modern multi-core CPUs. 相似文献

17.

基于矩阵分块的LDPC码快速编码结构研究

窦金芳周诠《微电子学与计算机》2007,24(1):166-168

低密度奇偶校验(LDPC)码由于具有接近香农限的性能和高速并行的译码结构而成为研究热点。然而,当码长很长时,编译码器的硬件实现变得很困难。文章从编译码实际实现的角度出发,提出一种基于分块的LDPC码下三角形校验矩阵结构,降低了编译码复杂度,不仅可以实现线性时间编码,同时还可以实现部分并行译码。仿真结果表明,具有这种结构的LDPC码和随机构造的LDPC码相比具有同样好的纠错性能。相似文献

18.

适于分层译码算法的LDPC码构造方法

下载免费PDF全文

王达董明科陈晨金野项海格《中国通信》2012,9(7):99-107

The layered decoding algorithm has been widely used in the implementation of Low Density Parity Check (LDPC) decoders, due to its high convergence speed. However, the pipeline operation of the layered decoder may introduce memory access conflicts, which heavily deteriorates the decoder throughput. To essentially deal with the issue of memory access conflicts, we propose a construction algorithm of LDPC codes, to which a constraint condition is added in the Progressive Edge-Growth (PEG) algorithm. The constraint condition can guarantee that for our constructed LDPC codes, the sets of all the variable nodes connected to the consecutive layers do not share any common variable node, which can avoid the memory access conflicts. Simulation results show that the performance of our constructed LDPC codes is close to the several other LDPC codes adopted in wireless standards. Moreover, compared with the decoder for IEEE 802. 16e LDPC codes, the throughput of our LDPC decoder has large improvement, while the chip resource consumption is unchanged. Thus, our constructed LD-PC codes can be adopted in the high-speed transmission. 相似文献

19.

CORDIC instructions for LDPC decoding on SDR platforms

Murugappan Senthilvelan Meng Yu Daniel Iancu Mihai Sima Michael Schulte 《Analog Integrated Circuits and Signal Processing》2011,69(2-3):191-206

Wireless protocols strive to increase spectral efficiency and achieve high data throughput. Low-density parity-check (LDPC) codes are advanced forward error correction (FEC) codes that use iterative decoding techniques to achieve close to the Shannon capacity. Due to their superior performance, state-of-art wireless protocols, such as WiMAX and LTE Advanced, are adopting LDPC codes. LDPC codes come with the high cost of drastically increased computational effort for decoding. Among the proposed decoding algorithms, the belief propagation (BP) algorithm leads to a good approximation of an optimal decoder; however, it uses compute-intensive hyperbolic trigonometric functions. To reduce the computational complexity, typical LDPC decoder implementations use simplified algorithms, such as the min-sum algorithm, at the expense of reduced signal processing performance. Efficient and accurate methods to compute hyperbolic trigonometric functions can facilitate the use of the BP algorithm in real-time LDPC decoder implementations. This paper investigates hyperbolic COordinate Rotation DIgital Computer (CORDIC) instruction set architecture (ISA) extensions for software-defined radio (SDR) processors to compute the hyperbolic trigonometric functions for LDPC decoding efficiently. The CORDIC ISA extensions are evaluated on the low-power multi-threaded Sandbridge Sandblaster? SB3000 platform. The computational performance, numerical accuracy, hardware estimates, power consumption estimates, and memory requirements with the CORDIC ISA extensions are compared to a baseline implementation without these extensions on the SB3000. 相似文献

20.

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU

Hyunwoo Ji Junho Cho Wonyong Sung 《Journal of Signal Processing Systems》2011,64(1):149-159

Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of the matrix is often larger than that of GPU on-chip memory especially when the code length is long or the weight is high. In this work, the parity-check matrix of cyclic or quasi-cyclic (QC) LDPC codes is greatly compressed by exploiting the periodic property of the matrix. Also, vacant elements are eliminated from the sparse message arrays to utilize the coalesced access of global memory supported by GPGPUs. Regular projective geometry (PG) and irregular QC LDPC codes are used for sum-product algorithm based decoding with the GTX-285 NVIDIA graphics processing unit (GPU), and considerable speed-up results are obtained. 相似文献