首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a high performance, power efficient and low hardware cost architecture for motion estimation (ME) targeting portable consumer applications. This hardware uses the Sub-sampled Diamond Search algorithm (SDS) with a Dynamic Iteration Control (DIC). The SDS–DIC algorithm can significantly reduce the number of SAD (Sum of Absolute Difference) calculations for block matching, thus enabling the development of an efficient hardware design for the ME. The DIC technique allows for the required throughput to be achieved with a restriction in the number of iterations, which contributes to the reduction in the overall number of clock cycles needed for the motion vector calculation. The processing units (PU) of the ME were developed by using efficient hierarchical adder-compressors, where simultaneous additions of more than two operands can be performed. The results we present show that, by using both the adder compressors in the PU and the DIC technique, it is possible to obtain an efficient ME architecture with higher performance and reduced power consumption. The architecture that implements this algorithm and the PUs was described in VHDL. Hardware synthesis results are presented for a 0.18 μm CMOS standard cell library. The architecture can reach real time for HDTV 1080p with less than 40 mW of power consumption.  相似文献   

2.
Affine transformation is widely used in image processing. Recently, it is recommended by MPEG-4 for video motion compensation. This paper presents a novel low power parallel architecture for texture warping using affine transformation (AT). The architecture uses a novel multiplication-free algorithm that employs the algebraic properties of the AT. Low power has been achieved at different levels of the design. At the algorithmic level, replacing multiplication operations with bit shifting saves the power and delay of using a multiplier. At the architecture level, low power is achieved by using parallel computational units, where the latency constraints and/or the operating latency can be reduced. At the circuit level, using low power building blocks (such as low power adders) contributes to the power savings. The proposed architecture is used as a computational kernel in video object coders. It is compatible with MPEG-4 and VRML standards. The architecture has been prototyped in 0.6 m CMOS technology with three layers of metal. The performance of the proposed architecture shows that it can be used in mobile and handheld applications.  相似文献   

3.
Hardware software co-synthesis process intends to determine an optimal architecture for an embedded application specified by a task graph or a specification language. In this paper, we present a co-synthesis approach targeting MPSoCs and distributed memory multiprocessor architectures for high performance embedded applications. Our co-synthesis approach produces pipelined multiprocessor architectures consisting of heterogeneous processing elements connected by a point-to-point communication structure. The co-synthesis process consists of four distinct phases; processing element selection for addition to the system, pipelined task allocation, scheduling and a regular interconnection topology mapping. Initially, an irregular topology is generated that is mapped to a regular architecture. Our co-synthesis methodology performs system partitioning and produces an irregular topology multiprocessor system. It also generates an optimal (or sub-optimal) regular topology architecture after considering some of the well-known regular topologies like mesh, hypercube, tree, etc. The co-synthesis method is demonstrated by exploring embedded architectures for MPEG encoder and artificially generated application task graphs representing complex embedded systems.  相似文献   

4.
This paper proposes an analog CMOS circuit that implements a central pattern generator (CPG) for locomotion control in a quadruped walking robot. Our circuit is based on an affine transformation of a reaction-diffusion cellular neural network (CNN), and uses differential pairs with multiple-input floating-gate (MIFG) MOS transistors to implement both the nonlinearity and summation of CNN cells. As a result, the circuit operates in voltage mode, and thus it is expected to reduce power consumption. Due to good matching accuracy of devices, the circuit generates stable rhythmic patterns for robot locomotion control. From experimental results on fabricated chip using a standard CMOS 1.5-/spl mu/m process, we show that the chip yields the desired results; i.e., stable rhythmic pattern generation and low power consumption.  相似文献   

5.
6.
一种用于视频对象编码的运动模式识别算法   总被引:1,自引:0,他引:1       下载免费PDF全文
黎洪松  许保华 《电子学报》2007,35(12):2324-2328
针对目前视频编码中广泛采用的块匹配运动估计补偿(ME+MC)算法的不足,提出一种基于自组织映射(SOM)的运动模式识别(MPR)算法,并将其应用于会议电视的视频对象编码中.为了改善SOM算法的性能,提出一种频率敏感的自组织映射算法(FSOM).实验表明,与ME+MC算法相比,FSOM-MPR算法具有更好的预测编码性能.对Claire视频测试序列,当压缩比为170∶1时,重建视频图像的平均峰值信噪比(PSNR)有2.7dB的改善.  相似文献   

7.
This work proposes a new algorithm to synthesize low power bipartition-codec architecture for pipelined circuits. The bipartition-codec architecture has been introduced as an effective power reduction technique for circuit design. The entropy-based partition-codec (ENPCO) algorithm extends this approach as it optimizes for both: power and area. It uses entropy as a criterion to balance between power and area. The ENPCO algorithm is composed of two phases: first, it clusters the output vectors with high occurrence into a group, moving all remaining output vectors into another group. The first group will be encoded in order to save power. Secondly, based on circuit entropy, output patterns are moved between both groups in order to balance power consumption and area overhead. A number of Microelectronic Center of North Carolina (MCNC) benchmarks were used to verify the effectiveness of our algorithm. Results demonstrate that ENPCO algorithm can achieve low power with less area overhead than the single-phase algorithm introduced previously by Shanq-Jang Ruan et al. (1999).  相似文献   

8.
We introduce a variable block size motion estimation architecture that is adaptive to the full search (FS) and the three-step search (3SS) algorithms. Early termination, intensive data reuse, pipelined datapath with bit serial execution, and memory access management tailored to the search patterns of the FS and 3SS form key features of the architecture. The design was synthesized using Synopsys Design Compiler and 45nm standard cell library technology. The architecture sustains real-time CIF format with an operational frequency as low as 17.6MHz and consumes 1.98 mW at this clock rate. This architecture with its 500MHz peak operational frequency provides the end-user with the flexibility of choosing between video quality and throughput based on power consumption and processing speed constraints.  相似文献   

9.
We present a two-dimensional (2-D) mesh-based mosaic representation, consisting of an object mesh and a mosaic mesh for each frame and a final mosaic image, for video objects with mildly deformable motion in the presence of self and/or object-to-object (external) occlusion. Unlike classical mosaic representations where successive frames are registered using global motion models, we map the uncovered regions in the successive frames onto the mosaic reference frame using local affine models, i.e., those of the neighboring mesh patches. The proposed method to compute this mosaic representation is tightly coupled with an occlusion adaptive 2-D mesh tracking procedure, which consist of propagating the object mesh frame to frame, and updating of both object and mosaic meshes to optimize texture mapping from the mosaic to each instance of the object. The proposed representation has been applied to video object rendering and editing, including self transfiguration, synthetic transfiguration, and 2-D augmented reality in the presence of self and/or external occlusion. We also provide an algorithm to determine the minimum number of still views needed to reconstruct a replacement mosaic which is needed for synthetic transfiguration. Experimental results are provided to demonstrate both the 2-D mesh-based mosaic synthesis and two different video object editing applications on real video sequences.  相似文献   

10.
Aggressive processor design methodology using high-speed clock and deep submicrometer technology is necessitating the use of at-speed delay fault testing. Although nearly all modern processors use pipelined architecture, no method has been proposed in literature to model these for the purpose of test generation. This paper proposes a graph theoretic model of pipelined processors and develops a systematic approach to path delay fault testing of such processor cores using the processor instruction set. The proposed methodology generates test vectors under the extracted architectural constraints. These test vectors can be applied in functional mode of operation, hence, self-test becomes possible. Self-test in a functional mode can also be used for online periodic testing. Our approach uses a graph model for architectural constraint extraction and path classification. Test vectors are generated using constrained automatic test pattern generation (ATPG) under the extracted constraints. Finally, a test program consisting of an instruction sequence is generated for the application of generated test vectors. We applied our method to two example processors, namely a 16-bit 5-stage VPRO pipelined processor and a 32-bit pipelined DLX processor, to demonstrate the effectiveness of our methodology  相似文献   

11.
In H.264/AVC, the motion estimation (ME) routine supports variable block size and involves highly parallel sum of absolute difference (SAD) computations. In this study, we introduce a bit serial hybrid-grained processing element (PE) based 2D architecture that has both early termination and intensive data reuse capabilities. PEs operate on most significant bit-first arithmetic for early termination and the 2D architecture enables on-chip data reuse between neighboring PEs in a bit-by-bit pipelined fashion. Hybrid-grained PEs reduce the hardware overhead of conventional adder tree structures used for implementing the variable block size ME. Our design reduces the gate count by 7x compared to its ASIC counterpart, operates at a comparable frequency while sustaining 30 fps and 60 fps; and outperforms bit parallel and bit serial architectures in terms of throughput and performance per gate for various video formats.  相似文献   

12.
A comparative analysis is presented of major methods for reducing power consumption in pipelined ADCs. They are classified into two groups: (1) methods for the structural or parametric optimization of the conventional architecture and (2) original circuit configurations designed to minimize the power consumption by individual units.  相似文献   

13.
Motion compensation using two-dimensional (2-D) mesh models requires computation of the parameters of a spatial transformation for each mesh element (patch). It is well known that the parameters of an affine (bilinear or perspective) mapping can be uniquely estimated from three (four) point correspondences (at the vertices of a triangular or quadrilateral mesh element). On the other hand, overdetermined solutions using more than the required minimum number of point correspondences provide increased robustness against correspondence-estimation errors, however, this necessitates special consideration to preserve mesh-connectivity. This paper presents closed-form, overdetermined solutions for least squares estimation of affine motion parameters for a triangular mesh, which preserve mesh-connectivity using patch-based or node-based connectivity constraints. In particular, four new algorithms are presented: patch-constrained methods using point correspondences or spatio-temporal intensity gradients, and node-constrained methods using point correspondences or spatio-temporal intensity gradients. The methods using point correspondences can be viewed as postprocessing of a dense motion field for best representation in terms of a set of irregularly spaced samples. The methods that are based on spatio-temporal intensity gradients offer closed-form solutions for direct estimation of the best node-point motion vectors (equivalently the best transformation parameters). We show that the performance of the proposed closed-form solutions are comparable to those of the alternative search-based solutions at a fraction of the computational cost.  相似文献   

14.
充分利用彩色视频亮度和色度分量之间的相关性,提出了一种基于DT网格的彩色视频帧间编码方案。该方案在运动估计时仅对亮度分量Y进行DT(Delaunay triangulation)描述,对亮度分量的网格节点进行连续运动估计,利用6参数方法得到三角形内像素点的运动矢量,经过相似变换得到色度分量的运动矢量。另外对残差图像进行特殊处理和编码。实验结果表明,相对于适合低码率传输的H.263编码方法,在相同的压缩比下该方案解码图像有更好的主客观质量。  相似文献   

15.
We present a variational framework for deinterlacing that was originally used for inpainting and subsequently redeveloped for deinterlacing. From the framework, we derive a motion adaptive (MA) deinterlacer and a motion compensated (MC) deinterlacer and test them together with a selection of known deinterlacers. To illustrate the need for MC deinterlacing, the problem of details in motion (DIM) is introduced. It cannot be solved by MA deinterlacers or any simpler deinterlacers but only by MC deinterlacers. The major problem in MC deinterlacing is computing reliable optical flow [motion estimation (ME)] in interlaced video. We discuss a number of strategies for computing optical flows on interlaced video hoping to shed some light on this problem. We produce results on challenging real world video data with our variational MC deinterlacer that in most cases are indistinguishable from the ground truth.  相似文献   

16.
This paper addresses the design of power efficient dedicated structures of Radix-2 Decimation in Time (DIT) pipelined butterflies, aiming the implementation of low power Fast Fourier Transform (FFT), using adder compressors, with a new XOR gate topology. In the FFT computation, the butterflies play a central role, since they allow calculation of complex terms. In this calculation, involving multiplications of input data with appropriate coefficients, the optimization of the butterfly can contribute for the reduction of power consumption of FFT architectures. In this paper, different and dedicated structures for the 16 bit-width pipelined Radix-2 DIT butterfly, running at 100 MHz, are implemented, where the main goal is to minimize both the number of real multipliers and the critical path of the structures. This is done by changing the structure of the complex multipliers and applying them into the butterflies. For logic synthesis of the implemented butterflies it was used Cadence Encounter RTL Compiler tool with XFAB MOSLP 0.18 μm library. Area and power consumption results are presented for the synthesized butterflies. Regarding power consumption, switching activity analysis is performed using 10,000 inputs vectors at inputs of the butterflies. The main results show that when combining the use of pipeline approach and the use of efficient adder compressors, with a new XOR gate topology, the power consumption of the butterflies is significantly reduced.  相似文献   

17.
The optimum architecture design and mapping of QRD-RLS adaptive filters can be achieved through filter architecture selections, look-ahead transformations, and hierarchical pipelining/folding transformations. In this paper, a relaxed annihilation-reordering look-ahead (RARL) architecture is proposed, and shown to be more power and area efficient than pipelined processing architecture which was considered the most area efficient. The filters with this architecture are based on relaxed weight-update through filtering approximation, where a filter tap weight is updated upon arrival of every block of input data, and are speeded up with annihilation-reordering look-ahead transformation. As a result of the computational complexity reduction, this architecture does not change the iteration bound and filter clock frequency, and leads to speed up with linear increase in power consumption, while the pipelined processing architectures result in speedup with quadratic increase in power consumption. Upon hardware mapping, this architecture is also more advantageous to achieve low area designs. Two design examples are presented to illustrate mapping optimization using above transformations. These results are important for mapping designs onto ASICs, FPGAs or parallel computing machines. The results show significant improvements in throughput, power consumption and hardware requirement. It is also interesting to show through mathematics and simulations that the RARL QRD-RLS filters have no performance degradation in terms of convergence rate.  相似文献   

18.
Motion estimation in H.264/AVC, is done in two parts – integer motion estimation, and fractional motion estimation. Hardware reuse for both parts is inefficient due to the differences between them. In this paper we address the hardware reuse problem by proposing a, fast motion estimation algorithm as well as a pipelined FPGA-based, field programmable system-on-chip (FPSoC), for integer and fractional motion estimation. Our results show that the rate-distortion loss of our algorithm is insignificant when compared to full search in H.264/AVC. Its average Y-PSNR loss is 0.065 dB, its average percentage bit rate increase is 5 %, and its power consumption is 76 mW. Our FPSoC is hardware-efficient, even out-performing some state-of-the-art ASIC implementations. It can support up to high definition 1280?×?720p video at 24Hz. Thus, our proposed algorithm and architecture is suitable for delivery of high quality video on low power devices and low bit rate applications which typically use H.264/AVC baseline profile@levels 1–3.1.  相似文献   

19.
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9,?7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65?nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486?MHz with a power consumption of 2.56?mW. This architecture is suitable for real-time video compression even with large frame dimensions.  相似文献   

20.
The sum of absolute difference (SAD) is generally adopted as a cost function in motion estimation (ME) and temporal error concealment (TEC) algorithm owing to its efficiency. The hardware architecture of SAD also consumes considerable power dissipation in video codec chip. Hence, the switching-activity analysis on SAD is quite essential from algorithm/architecture perspectives. This work develops the estimation formulas for switching-activity dedicated for SAD engine according to probability theory. The experiment results reveal that the probability error rate (PER) of the SAD engine is as minor as 5.61%. Consequently, this leads to a precise switching-activity estimation of the SAD-based algorithm/architecture for video signal processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号