共查询到19条相似文献,搜索用时 125 毫秒
1.
2.
3.
研究了如何利用GPU来加速视频解码,概述了MPEG-2视频解码的系统框架,论述了MPEG-2视频解码在Linux下以XvMC(X video motion compensation)为API并基于通用可编程GPU的实现过程,重点讨论了MPEG-2视频解码中IDCT(inverse discrete cosine transform)和运动补偿的实现,提出了新的优化算法.MPEG-2视频解码算法具有一定的通用性,实验结果表明,与传统的解码方式相比,该解码器不仅能加速视频解码,还能有效降低CPU的利用率和电脑的功耗. 相似文献
4.
MPEG4视频解码系统的设计与实现 总被引:1,自引:0,他引:1
文章介绍了MPEG4视频解码器的工作原理,结合XviD组织提供的视频解码基本算法的代码,实现了基于MPEG4比特流视频图像的解码。 相似文献
5.
6.
基于FPGA的视频解码芯片验证平台设计 总被引:1,自引:0,他引:1
随着视频编解码算法复杂度的增加,视频处理器设计的难度和成本也大幅度增加.针对视频解码器芯片的仿真和验证要求,文章提出了视频解码芯片的验证框架.基于VirtexE系列的FPGA芯片设计实现了视频解码器的验证平台.并对视频解码器的FPGA的验证问题进行了分析. 相似文献
7.
根据H.264/AVC及AVS的特点,设计出一种适合于帧内预测解码的硬件实现方式,并根据H.264和AVS帧内预测运算上的相似性提出了基于可重构的并行结构,有利于提高解码速度,并将该结构配合其他设计好的解码器模块,在FPGA上实现了高准清晰度的H.264及AVS视频的实时解码。 相似文献
8.
提出了一种基于片上系统的可兼容多标准的视频解码系统结构,以满足多模解码芯片的低本设计要求,并给出了新一代多模视频解码芯片硬件结构的具体实现方案,其包含AVS、MPEG-2、MPEG-4标准的解码,此实现方案可为该多模解码器节约45%的运算单元、28%的RAM资源、86%的RAM面积以及部分Registerfile与通用电路. 相似文献
9.
10.
针对数字信号处理器的片上存取结构特点,提出了一个优化视频解码运动补偿过程数据布局的方案。在片上便签式存储器(SPM)中设置一个乒乓缓冲存储区用于保存运动补偿所需的数据。在对当前宏块进行运动补偿的同时,预取后续运动补偿所需的数据替换其中不再使用的数据;同时,通过数据索引算法获得运动补偿所需的数据的地址,从而实现数据处理与数据存取的并行流水优化。基于TMS320DM642处理器的实验结果表明,优化后MPEG-4视频解码器的解码速度平均提高了6.7%,整个解码过程中DM642的片上二级缓存的能耗平均降低了18.5%。由此可见,对运动补偿过程进行数据布局优化确实可以提高解码性能并降低能耗。 相似文献
11.
有效的低功耗编译优化方法:部件使用局部化 总被引:4,自引:1,他引:4
使用软件技术优化系统能量正得到更多的关注.利用系统的动态电压缩放和功能部件关闭的功能为减少冗余能量消耗提供了优化的新途径,而编译指导的动态电压缩放(dynamic voltage scaling,简称DVS)和功能部件关闭(turning off unused system units,简称TOSU)是软件优化方法之一.DVS或TOSU涉及到很多技术细节.抽象出可以用于编译研究的分析模型,根据对模型的研究,提出了部件使用局部化的概念.部件使用局部化在存在DVS和TOSU的技术支持下,是有效的低功耗编译优化方法. 相似文献
12.
Edward T.-H. Chu Tai-Yi Huang Cheng-Han Tsai Jian-Jia Chen Tei-Wei Kuo 《Real-Time Systems》2009,41(3):222-255
The I/O subsystem has become a major source of energy consumption in a hard real-time monitoring and control system. To reduce
its energy consumption without missing deadlines, a dynamic power management (DPM) policy must carefully consider the power
parameters of a device, such as its break-even time and wake-up latency, when switching off idle devices. This problem becomes
extremely complicated when dynamic voltage scaling (DVS) is applied to change the execution time of a task. In this paper,
we present COLORS, a composite low-power scheduling framework that includes DVS in a DPM policy to maximize the energy reduction
on the I/O subsystem. COLORS dynamically predicts the earliest-access time of a device and switches off idle devices. It makes
use of both static and dynamic slack time to extend the execution time of a task by DVS, in order to create additional switch-off
opportunities. Task workloads, processor profiles, and device characteristics all impact the performance of a low-power real-time
algorithm. We also identify a key metric that primarily determines its performance. The experimental results show that, compared
with previous work, COLORS achieves additional energy reduction up to 20%, due to the efficient utilization of slack time.
相似文献
Tei-Wei KuoEmail: |
13.
Energy-aware scheduling and simulation methodologies for parallel security processors with multiple voltage domains 总被引:1,自引:1,他引:0
Yung-Chia Lin Yi-Ping You Chung-Wen Huang Jenq Kuen Lee Wei-Kuan Shih Ting-Ting Hwang 《The Journal of supercomputing》2007,42(2):201-223
Dynamic voltage scaling (DVS) and power gating (PG) have become mainstream technologies for low-power optimization in recent
years. One issue that remains to be solved is integrating these techniques in correlated domains operating with multiple voltages.
This article addresses the problem of power-aware task scheduling on a scalable cryptographic processor that is designed as
a heterogeneous and distributed system-on-a-chip, with the aim of effectively integrating DVS, PG, and the scheduling of resources
in multiple voltage domains (MVD) to achieve low energy consumption. Our approach uses an analytic model as the basis for
estimating the performance and energy requirements between different domains and addressing the scheduling issues for correlated
resources in systems. We also present the results of performance and energy simulations from transaction-level models of our
security processors in a variety of system configurations. The prototype experiments show that our proposed methods yield
significant energy reductions. The proposed techniques will be useful for implementing DVS and PG in domains with multiple
correlated resources. 相似文献
14.
基于DVS的实时多核嵌入式系统低功耗算法 总被引:2,自引:0,他引:2
动态电压调整(DVS)是低功耗设计方法中最基本的技术。然而,大部分的算法是基于单处理器平台的,并且仅考虑了相互独立的任务,这时使用DVS往往不能取得较好的效果。基于DVS提出了一种循环旋转调度技术来降低功耗,通过对程序中的循环进行重组,使得在满足时限的同时功耗最小,同时也考虑了电压转换所消耗的时间和功耗。 相似文献
15.
Traditionally, code scheduling is used to optimize the performance of an application, because it can rearrange the code to allow the execution of independent instructions in parallel based on instruction level parallelism (ILP). According to our observations, it can also be applied to reduce power dissipation by taking advantage of the properties of existing low-power techniques. In this paper, we present a power-aware code scheduling (PACS), which is a code scheduling integrated with power gating (PG) and dynamic voltage scaling (DVS) to reduce power consumption while executing an application. In other words, from the viewpoint of compilation optimization, PG and DVS can be applied simultaneously to a code and their impact can be enhanced by code scheduling to further save power. The result shows that when compared with hardware power gating, the proposed PACS can outperform by more than 33% and 41% in terms of energy delay product and energy delay2 product for DSPStone and Mediabench. 相似文献
16.
Sylvain Durand Anne-Marie Alt Nicolas Marchand 《International journal of systems science》2013,44(8):1432-1446
Embedded devices using highly integrated chips must cope with conflicting constraints, while executing computationally demanding applications under limited energy storage. Automatic control and feedback loops appear to be an effective solution to simultaneously accommodate for performance uncertainties due to the tiny scale gates variability, varying and poorly predictable computing demands and limited energy storage constraints. This paper presents the example of an embedded video decoder controlled by several feedback loops to carry out the trade-off between decoding quality and energy consumption, exploiting the frequency and voltage scaling capabilities of the chip. The inner loop controls the dynamic voltage and frequency scaling through a fast predictive control strategy. The outer loop computes the scheduling set-points needed by the inner loop to process frames decoding. The feedback loops have been implemented on a stock PC and experimental results are provided. 相似文献
17.
Alberto Corrales-García José Luis Martínez Gerardo Fernández-Escribano Francisco J. Quiles 《Multimedia Tools and Applications》2014,68(3):717-745
The Wyner-Ziv video coding paradigm provides a framework where most of the complexity is moved from the encoder to the decoder. In this way, Wyner-Ziv coding efficiently supports multimedia services for mobile devices which have to capture, encode and send video. However, the complexity of the decoder is quite high and it should be reduced. This work presents several parallel Wyner-Ziv decoding algorithms aimed at reducing this high complexity. Considering the fact that technological advances provide us new hardware which supports parallel data processing, these algorithms efficiently distribute the burden of the complexity over the number of cores which are available in the architecture. Particularly four parallel approaches have been proposed and analyzed. In the first parallel approach, the each bitplane of a frame could be decoded in a parallel way by a different core, achieving a time reduction of 33.21 % in average, although it depends on the number of bitplanes used. The second approach proposes a spatial distribution of each frame, avoiding dependences between bitplanes and then obtaining a time reduction of 67 % in average. The third approach executes each GOP in a parallel way, avoiding all synchronization dependences and achieving 71 % of time reduction in average, although the maximum performance is reached when the key frame buffer is full. Finally, the last approach distributes the burden of complexity over two levels, namely GOP and frame, in order to obtain the advantages of both: a negligible rate distortion penalty based on the GOP approach, and a low delay introduced by the spatial distribution approach. By using this parallel approach, the decoding time is reduced up to 76 %. In addition, by using parallel decoding, 60 % of the energy consumption is saved. The proposed methods are scalable for any multicore processor architecture and adaptable for different Wyner-Ziv decoding schemes. 相似文献
18.
Chaeseok Im Soonhoi Ha 《Design & Test of Computers, IEEE》2004,21(5):358-366
The trade-off between energy consumption and media quality is a fundamental design issue in mobile multimedia. This energy optimization technique based on frame skipping and buffering exploits the characteristics of video applications, particularly their tolerance to variation in latency and video quality to increase slack time and its use in dynamic voltage scaling. The technique adapts the frame rate to the degree of motion activity by skipping redundant frames. Thus, it intentionally creates slack time as long as video quality is not unacceptably degraded. We implemented this technique with only a slight modification of the original encoder algorithm and no modification to the decoder. 相似文献
19.
In this paper, a low-cost compatible motion compensator is implemented and integrated into a macroblock-level three-stage-pipelined
HDTV decoder, in which an embedded compression (EC) engine is realized as well. The decoder with EC engine is designed to
reduce the power consumption and memory bandwidth requirement since memory accesses are reduced. In the motion compensator,
a boundary judgment scheme for reference pixel fetching is proposed to provide seamless integration in HDTV video decoder
for the block-based EC engines. Furthermore, a buffer sharing mechanism is adopted to reduce extra memory requirement involved
by EC. The reference pixel fetching unit costs only 17.3 K logic gates when the working frequency is set to 166.7 MHz. On
average, when decoding HD1080 video sequence, 30% memory access reduction and 24% memory power consumption saving are achieved
when a near lossless EC algorithm is integrated in the video decoder. In other words, the proposed motion compensator makes
the EC engine an integral part of a memory reduced decoder without extra cost. Additionally, since the work in this paper
is based on EC schemes, the EC design criterion are discussed, and several useful rules on the selection of EC algorithm are
addressed for the video decoder of corresponding VLSI architecture. 相似文献