首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
This paper shows how a bus topology performs as a System-on-Chip (SoC) interconnection. We measure and analyze Heterogeneous IP Block Interconnection (HIBI) bus for a multiple clock domain, Multiprocessor System-on-Chip (MPSoC) with an MPEG-4 video encoding application on FPGA. The studied MPSoC contains up to 22 IP blocks: 11 soft processors, 8 hardware accelerators and three other components. A novel approach of frequency scaling is used to isolate the impact of various architecture components. The system is benchmarked in various configurations. For example, HIBI is run at 100× speed with respect to processors to resemble ideal interconnection. Based on the measurements with up to 16.9frames/s CIF (352 × 288) encoding speed, estimation for HDTV resolution video encoder is presented. The required optimizations are discussed. Finally, it is shown that 25frames/s 1280 × 720 video encoder needs 55 MHz HIBI but 670 MHz general-purpose soft RISC processors. In practice, the processing performance has to be boosted by implementing hardware acceleration and improving memory hierarchy. Clearly, HIBI is not the limiting factor.  相似文献   

2.
为有效解决运动补偿的多标准兼容问题,该文提出了一种改进的适用于多标准运动补偿的新插值算法结构,新插值算法基于文中提出的RL(Rounding Last)策略和DTS(Diagonal Two Step)策略,其采用一种统一的两步插值结构有效地兼容了各标准中亮度分量和色度分量的插值。基于新算法,设计实现了一种可重构的多标准运动补偿硬件电路,该电路采用了基于可变块大小的运动补偿结构。实现结果表明,与JM8.4中基于44固定块大小的运动补偿结构相比,所设计的电路使得带宽需求降低了27%~50%,平均单次访问外部存储器的突发长度提高了1.22~2.25倍;电路在125 MHz工作频率下可满足全高清1080 p (19201080) 30帧/s的实时解码需求。  相似文献   

3.
A low-power dual-standard video decoder has been developed for mobile applications. It supports MPEG-2 SP@ML and H.264/AVC BL@L4 video decoding in a single chip and features a scalable architecture to reach area/power efficiency. This chip integrates diverse algorithms of MPEG-2 and H.264/AVC to reduce silicon area. Three low-power techniques are proposed. First, a domain-pipelined scalability (DPS) technique is used to optimize the pipelined structure according to the number of processing cycles. Second, bandwidth scalability is implemented via a line-pixel-lookahead (LPL) scheme to improve the external bandwidth and reduce the internal memory size, leading to 51% of memory power reduction compared to a conventional design. Third, low-power motion compensation and deblocking filter are designed to reduce the operating frequency without degrading system performance. A test chip is fabricated in a 0.18mum one-poly six-metal CMOS technology with an area of 15.21 mm2. For mobile applications, H.264/AVC and MPEG-2 video decoding of quarter-common intermediate format (QCIF) sequences at 15 frames per second are achieved at 1.15 MHz clock frequency with power dissipation of 125 muW and 108 muW, respectively, at 1V supply voltage  相似文献   

4.
Global motion estimation and compensation (GME/GMC) is an important video processing technique and has been applied to many applications including video segmentation, sprite/mosaic generation, and video coding. In MPEG-4 Advanced Simple Profile (ASP), GME/GMC is adopted to compensate camera motions. Since GME is important, many GME algorithms have been proposed. These algorithms have two common characteristics, huge computation complexity and ultra large memory bandwidth. Hence for realtime applications, a hardware accelerator of GME is required. However, there are many hardware design challenges of GME like irregular memory access and huge memory bandwidth, and only few hardware architectures have been proposed. In this paper, we first analyzed three typical algorithms of GME, and a fast GME algorithm is proposed. By using temporal prediction and skipping the redundant computation, 91% memory bandwidth and 80% iterations are saved, while the performance is kept, compared to Gradient Descent in MPEG-4 Verification Model. Based on our proposed algorithm, a hardware architecture of GME is also presented. A new scheduling, Reference-Based Scheduling, is developed to solve the irregular memory access problem. An interleaved memory arrangement is applied to satisfy the memory access requirement of interpolation. The total gate count of hardware implementation is 131 K with Artisan 0.18 um cell library, and the internal memory size is about 7.9 Kb. Its processing ability is MPEG-4 ASP@L3, which is 352×288 with 30 fps, at 30 MHz.
Liang-Gee ChenEmail:
  相似文献   

5.
Multiview video coding (MVC) plays an important role in a 3-D video system. In addition, the resolution of HDTV is increasing to present more vivid perception for users. To realize real-time processing of dozens of TOPS, VLSI solution is necessary. However, ultra high computational complexity, a large amount of external memory bandwidth and on-chip SRAM size, and complex MVC prediction structures are three main design challenges of implementation of MVC hardware architecture. In this paper, an MVC single-chip encoder is proposed for H.264/AVC Multiview High Profile and High Profile for 3-D and quad full high definition (QFHD) TV applications, respectively. The 4096 × 2160 p multiview video encoder chip is implemented on a 11.46 mm2 die with 90 nm CMOS technology. An eight-stage macroblock pipelined architecture with proposed system scheduling and cache-based prediction core supports real-time processing from one-view 4096 × 2160 p to seven-view 720 p videos. The 212 Mpixels/s throughput is 3.4 to 7.7 times higher than previous work. The 407 Mpixels/W power efficiency is achieved, and 94% on-chip SRAM size and 79% external memory bandwidth are saved by the proposed techniques.  相似文献   

6.
In this article an architecture is presented which allows efficient ASIC implementations of high throughput applications. Examples of these applications can be found in real time video applications such as EDTV, IDTV and HDTV. A key issue in the architecture is to provide a balance between memory resources and processing resources. Special attention is paid to the communication between these two types of resources. Architectural techniques are proposed to solve bottlenecks in the memory bandwidth and conflicts between memory accesses. Architectures for address generation in combination with location assignment are presented. The flexibility of the architectural model allows an efficient hardware realization on an ASIC exploiting the inherent parallelism of a particular application. This is illustrated in the article using a complex video algorithm for Progressive Scan Conversion. The proposed architecture is used as a target architecture which drives the high-level synthesis approach of the PHIDEO compiler.  相似文献   

7.
This paper presents interpolation-free fractional-pixel motion estimation (FME) algorithms and efficient hardware prototype of one of the proposed FME algorithms. The proposed algorithms use a mathematical model to approximate the matching error at fractional-pixel locations instead of using the block matching algorithm to evaluate the actual matching error. Hence, no interpolation is required at fractional-pixel locations. The matching error values at integer-pixel locations are used to evaluate the mathematical model coefficients. The performance of the proposed algorithms has been compared with several FME algorithms including the full quarter-pixel search (FQPS) algorithm, which is used as part of the H.264 reference software. The computational cost and the performance analysis show that the proposed algorithms have about 90% less computational complexity than the FQPS algorithm with comparable reconstruction video quality (i.e., approximately 0.2 dB lower reconstruction PSNR values). In addition, a hardware prototype of one of the proposed algorithms is presented. The proposed architecture has been prototyped using the TSMC 0.18 μm CMOS technology. It has maximum clock frequency of 312.5 MHz, at which, the proposed architecture can process more than 70 HDTV 1080p fps. The architecture has only 13,650 gates. The proposed architecture shows superior performance when compared with several FME architectures.  相似文献   

8.
洪琪  曹伟  童家榕 《电子学报》2011,39(5):1059-1063
提出了一种新的支持MPEG-4 AVC/H.264标准4×4整数变换的动态可重构结构.首先,针对4×4正反变换分别推导了两个新的二维直接信号流图.进而设计了一个面向HDTV应用的动态可重构多变换结构.该结构无需转置寄存器且计算单元仅需16个加法器(减法器).采用0.18μm CMOS工艺实现了该电路结构.结果表明,最高...  相似文献   

9.
针对H.264/AVC中的去块效应滤波器,该文提出了一种新的滤波处理顺序,能够显著减小片上数据缓存容量,并以此为基础设计了一种去块效应滤波器的VLSI硬件新结构。该结构利用数据复用机制减少对片外存储的访问量、节省处理时间,同时不使用片内SRAM,将对片内SRAM的访问降为0。仿真结果显示,该电路在工作频率为100MHz时对HDTV能较好地实现实时滤波;在0.18m工艺下,综合后的等效逻辑门数只有16.8k。  相似文献   

10.
In this paper, we present high performance motion compensation architecture for H.264/AVC HDTV decoder. The bottleneck of efficient motion compensation implementation primarily rests on the high memory bandwidth demand and six-tap fractional interpolation complexity. To solve the bottleneck for H.264/AVC HD applications, three combined bandwidth optimization strategies are proposed to minimize the memory bandwidth for MB-based decoding process. To improve the interpolation hardware utilization and reduce the interpolation cycles, an interpolation classification scheme is proposed. By classifying the fifteen fractional pixels into five types and processing correspondingly, the interpolation cycles decrease significantly. A direct mapping memory cache characterized with circular addressing, byte-aligned addressing and horizontal and vertical parallel access is designed to support the proposed scheme. The hardware of proposed motion compensation is implemented at 100 M with 31.841 K logic gates, averagely 70–80% reduced memory bandwidth can be offered and the interpolation hardware can be fully utilized and interpolate one MB within 304 cycles, which can satisfy the real time constraint for H.264/AVC HD (1,920 × 1,088) 30 fps decoder. The design is implemented under UMC 0.18 μm technology, and the synthesis results and comparisons are shown.
Yu LiEmail:
  相似文献   

11.
Fractional Motion Estimation (FME) in high-definition H.264 presents a significant design challenge in terms of memory bandwidth, latency and area cost as there are various modes and complex mode decision flow, which require over 45% of the computation complexity in the H.264 encoding process. In this paper, a new high-performance VLSI architecture for Fractional Motion Estimation (FME) in H.264/AVC based on the full-search algorithm is presented. This architecture is made up of three different pipeline processors to establish a trade-off between processing time and hardware utilization. The computing scheme based on a 4-pixel interpolation unit with a 10-pixel input bandwidth is capable of processing a macroblock (MB) in 870 clock cycles. The final VLSI implementation only requires 11.4 k gates and 4.4kBytes of RAM in a standard 180 nm CMOS technology operating at 290 MHz. Our design generates the residual image and the best MVs and mode in a high throughput and low area cost architecture while achieving enough processing capacity for 1080HD (1920 × 1088@30fps) real-time video streams.  相似文献   

12.
Variable block-size motion estimation (VBSME) process occupies a major part of computation of an H.264 encoder, which is usually accelerated by bit-parallel hardware architectures with large I/O bit width to meet real-time constrains. However, such kind of architectures increase the area overhead and pin count, and therefore will not be suitable for area-constrained electronic consumer designs such as small portable multimedia devices. This paper addresses this problem by proposing two area efficient least significant bit (LSB) bit-serial architectures with small pin numbers. Both designs take advantage of data reusing technique in different ways for sum of absolute differences (SAD) computation and reading reference pixels, leading to a considerable reduction of memory bandwidth. The first architecture propagates the partial SAD and sum results and broadcasts the reference pixel rows whereas the second design reuse the SAD of small blocks and has a reconfigurable reference buffer leading to a better memory bandwidth when using hardware parallelism. The proposed designs benefit from several optimization techniques including an efficient serial absolute difference architecture, word length reduction by parallelism, bit truncation, mode filtering, and macroblock (MB) level subsampling, which significantly enhance their performances in terms of silicon area, throughput, latency, and power consumption. The first and second designs can support full search VBSME of 720?×?480 video with 30 frames per second (fps), two reference frames, and [?16, 15] search range at a clock frequency of 414 MHz with 29.28 k and 31.5 k gates, respectively.  相似文献   

13.
This paper presents a novel hardware architecture for the real-time high-throughput implementation of the adaptive deblocking filtering process specified by the H.264/AVC video coding standard. A parallel filtering order of six units is proposed according to the H.264/AVC standard. With a parallel filtering order (fully compliant with H.264/AVC) and a dedicated data arrangement in local memory banks, the proposed architecture can process filtering operations for one macroblock with less filtering cycles than previously proposed approaches. Whereas, filtering efficiency is improved due to a novel computation scheduling and a dedicated architecture composed of six filtering cores. It can be used either into the decoder or the encoder as a hardware accelerator for the processor or can be embedded into a full-hardware codec. This developed Intellectual Property block-based on the proposed architecture supports multiple and high definition processing flows in real time. While working at clock frequency of 150 MHz, synthesized under 65 nm low power and low voltage CMOS standard cell technology, it easily meets the throughput requirements for 4 k video at 30 fps of all the levels in H.264/AVC video coding standard and consumes 25.08 Kgates.  相似文献   

14.
Context-based Binary Arithmetic Coding (CBAC) is a normative part of the newest X Profile of Advanced Audio Video coding Standard (AVS). This paper presents an efficient VLSI architecture for CBAC decoding in AVS. Compared with CBAC in H.264/AVC, the simpler binarization methods and context selection schemes are adopted in AVS. In order to avoid the slow multiplications, the traditional arithmetic calculation is transformed to the logarithm domain. Although these features can obtain better balance between the compression gain and implementation cost, it still brings huge challenge for high-throughput implementation. The fact that current bin decoding depends on previous bin results in long latency and limits overall system performance. In this paper, we present a software–hardware co-design by using bin distribution feature. A novel pipeline-based architecture is proposed where the arithmetic decoding engine works in parallel with the context maintainer. A finite state machine (FSM) is used to control the decoding procedure flexibly and the context scheduling is organized carefully to minimize the access times of context RAMs. In addition, the critical path is optimized for the timing. The proposed implementation can work at 150 MHz and achieve the real-time AVS CBAC decoding for 1080i HDTV video.  相似文献   

15.
In this paper, an architecture for real-time digital HDTV video decoding is presented. Our architecture is based on a dual decoding datapath controlled in a fixed schedule with an efficient write-back scheme for anchor pictures. The decoding datapath is synchronized at the block (8 × 8 pixels) level. Unlike other decoding approaches such as the slice bar decoding method and the cross-divide method, our scheme reduces memory access contention problem to achieve real-time HDTV decoding without a high cost in overall decoder buffers, architecture, and bus. In comparison to data-flow approaches, our method eliminates the complexity associated with tagged data operations. Our anchor picture storage is organized to minimize page-breaks during memory accesses. Simulation shows that with a relatively low rate 81 MHz clock, our decoder can decode MPEG-2 MP@HL HDTV in real-time, based on an ATSC video format of 1,920 × 1,080 pixels/frame at 30 frames/s, at a bit rate of 18 to 20 Mbps.  相似文献   

16.
In this paper, a low-cost H.264/AVC video decoder design is presented for high definition television (HDTV) applications. Through optimization from algorithmic and architectural perspectives, the proposed design can achieve real-time H.264 video decoding on HD1080 video (1920 times 1088@30 Hz) when operating at 120 MHz with 320 mW power dissipation. Fabricated by using the TSMC one-poly six-metal 0.18 mum CMOS technology, the proposed design occupies 2.9times2.9 mm2 silicon area with the hardware complexity of 160K gates and 4.5K bytes of local memory  相似文献   

17.
AVS游程解码、反扫描、反量化和反变换优化设计   总被引:5,自引:0,他引:5  
赵策  刘佩林 《信息技术》2007,31(2):54-57
提出了一种适用于AVS的游程解码、反扫描、反量化和反变换硬件结构优化设计方案。根据AVS整数变换和量化的特性,设计了可工作在不同模式的存储器阵列,既可用来进行反变换器所需的转置操作,又可用来存储中间结果,将游程解码、反扫描和反量化合并为一个流水线单元并行处理。该设计省去了存储中间结果所需的大量存储器,加快了处理速度,满足高清视频的处理要求。该设计通过了FPGA验证,综合结果表明,其逻辑门数仅为9076,最高工作频率大于200MHz。  相似文献   

18.
In this paper, an autocorrelation-based lossless recompression (ABLR) algorithm is proposed. The ABLR can save the memory bandwidth of video coding systems and preserves the visual quality. The ABLR consists of two core techniques: (1) a correlation-based prediction technique and (2) a correlation-adaptive Golomb-Rice code. Furthermore, dual-mode memory addressing (DMMA) is also proposed to provide ABLR with memory random access functionality. The word-length utilization rate (WLUR) of DMMA is as high as 92.34 % on average. The experimental results reveal that the ABLR exhibits a lossless compression ratio of 2.05 on average for 1080p test sequences. This indicates that the memory bandwidth can be saved up to 50 %. The VLSI architecture of ABLR is designed with three-stage pipelining and is realized in 0.18 μm 1P6M CMOS technology with a cell-based design flow. The logic gate count is about 28 K and the core area is 0.69×0.68 mm2. The encoding capability can reach full HD (1920×1080)@30 fps at a clock rate of 62.5 MHz. The power dissipation is 9.35 mW at a clock rate of 62.5 MHz.  相似文献   

19.
方健  张丁  王匡 《电子学报》2009,37(2):419-423
 针对VC1的四种自适应反变换模式,文章提出了一种基于8×8块的反变换结构.利用VC1变换矩阵的对称性,通过数据块重构,四种反变换模式统一为相同的结构,大大简化了硬件设计.文章同时提出了硬件实现结构,在满足应用要求的同时,有效减小了硬件规模.实验仿真表明,在108MHz工作频率下,能够有效支持标清和高清图像实时解码的反变换运算.  相似文献   

20.
Pursuing an experience of high-end visual quality drives human to demand a higher display resolution and a higher frame rate. Hence, a lot of powerful coding tools are aggregated together in emerging video coding standards to improve coding efficiency. This also makes video coding standards suffer from two design challenges: heavy computation and tremendous memory bandwidth. The first issue can be properly solved by a careful hardware architecture design with advanced semiconductor processes. Nevertheless, the second one becomes a critical design bottleneck for a modern video coding system. In this article, a lossless frame recompression using multi-orientation prediction technique is proposed to overcome this bottleneck. This work is realised into a silicon chip with the technology of TSMC 0.18 µm CMOS process. Its encoding capability can reach full-HD (1920 × 1080)@48 fps. The chip power consumption is 17.31 mW@100 MHz. Core area and chip area are 0.83 × 0.83 mm2 and 1.20 × 1.20 mm2, respectively. Experiment results demonstrate that this work exhibits an outstanding performance on lossless compression ratio with a competitive hardware performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号