共查询到19条相似文献,搜索用时 296 毫秒
1.
2.
本文提出了满足AVS实时高清视频编码的变换、量化、反量化、反变换和扫描的硬件设计方案.该设计方案以宏块为单位进行操作,通过采用乒乓操作和流水线技术,提供了高性能的并行数据处理能力.本文根据AVS变换和反变换的特点,设计了RAM行列存储器,实现高速并行转置,同时,提出了利用RAM实现并行扫描的方法及其结构,提供高数据吞吐... 相似文献
3.
4.
Tile处理器是Tilera公司研发的一种新型的多核处理器,文章在介绍Tilera平台的多核处理器的基础上,根据该处理器的架构特点,在该平台上实现了AVS和H.264的标清/高清实时视频编码器。 相似文献
5.
异构众核系统已成为当前高性能计算领域重要的发展趋势。针对异构众核系统,从架构、编程、所支持的应用三方面分析对比当前不同异构系统的特点,揭示了异构系统的发展趋势及异构系统相对于传统多核并行系统的优势;然后从编程模型和性能优化方面分析了异构系统存在的问题和面临的挑战,以及国内外研究现状,结合当前研究存在的问题和难点,探讨了该领域进一步深入的研究方向;同时对两种典型的异构众核系统CPU+GPU和CPU+MIC进行不同应用类型的Benchmark测试,验证了两种异构系统不同的应用特点,为用户选择具体异构系统提供参考,在此基础上提出将两种众核处理器(GPU和MIC)结合在一个计算节点内构成新型混合异构系统;该新型混合异构系统可以利用两种众核处理器不同的处理优势,协同处理具有不同应用特点的复杂应用,同时分析了在该混合异构系统下必须要研究和解决的关键问题;最后对异构众核系统面临的挑战和进一步的研究方向进行了总结和展望。 相似文献
6.
《电子技术与软件工程》2016,(17)
stencil(模板计算)是高性能计算领域的七个主要模式之一,stencil计算的计算访存比低,主存带宽受限严重。在高性能计算领域中,处理器正在从多核体系结构设计迈向众核体系结构设计。那么如何在新型众核处理器上将stencil计算的性能提升较高的水平,提高kernel计算的效率,便成为了研究的典型问题。本文通过分析stencil的应用的特点提出了性能优化方法,并对Jacobi和时域有限差分进行了并行化,性能加速明显。 相似文献
7.
基于国产“神威·太湖之光”超级计算机平台,研究了适用于国产众核架构的一致性几何绕射理论方法的并行计算。该方法可用于城市环境电磁射线传播的高效分析以及以射线传播为基础的电磁态势预测。以260个异构核为基准, 其并行方案在4160 个异构核时的并行效率达到了99%以上。数值结果表明该并行方法能够在国产众核平台中快速高效地解决城市某区域的电磁场传播预测问题,并为后续安全、高效地开展城市电磁态势预测分析提供支撑。 相似文献
8.
9.
10.
本文通过对MCORE中M310改进,设计了一种新型的32位信息安全微处理器核CS320,该处理器核具有先进的反跟踪和保密机制,使得以CS320为核的系统芯片更加安全。 相似文献
11.
Zhenyu Wang Luhong Liang Guolei Yang Xianguo Zhang Jun Sun Debin Zhao Wen Gao 《Journal of Signal Processing Systems》2011,65(1):129-145
Implementation of video coding systems such as H.264/AVC and AVS on multi-core and many-core platforms is attracting much
attention. The slice-level parallelism is popular in parallel video coding for its simplicity and flexibility, however, the
video quality loses greatly since the partitioning of slices breaks the dependency between macro-blocks, especially on multi-core
or many-core platforms. To address this problem, we propose a Macro-Block Group (MBG) parallel scheme for parallel AVS coding.
In the proposed scheme, video frames are equally divided into rectangular MBG regions; each MBG consists of more rows and
less columns of macro-blocks than the slice-level scheme. Given that MBG is not syntactically supported by AVS, a vertical
partitioning scheme is introduced. Additionally, we use mode confining and motion vector difference adjusting techniques to
keep consistent with the standard. Two MBG parallel schemes (5 × 9 MBG partition and 8 × 7 MBG partition) are developed on
a TILE64 many-core platform, where P/B frames use the MBG parallel scheme and I frames use the macro-block-level parallelism.
Experimental results show that the proposed scheme of 5 × 9 MBG partition can achieve a reduction of 52% (IPPP) and 41% (IBBP)
quality loss while keeping the same speed-up compared with the slice-level parallelism. With more cores employed, the scheme
of 8 × 7 MBG partition gains 23.9 times of speed-up compared with the single-core implementation and achieves similar coding
performance as the 5 × 9 scheme. 相似文献
12.
An architecture of entropy decoder,inverse quantiser and predictor for multi-standard video decoding
Leibo Liu Yingjie Chen Shouyi Yin Hao Lei Guanghui He Shaojun Wei 《International Journal of Electronics》2013,100(7):877-893
A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency. 相似文献
13.
Jing Jun Zhang You Fang Lu Bin Wang 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》1998,28(3):467-471
The paper focuses on the inverse dynamics formulation of manipulators that is suitable for parallel computation, and a corresponding nonrecursive Newton-Euler formulation is presented. In order to illustrate its potential parallelism, a simple parallel scheduling scheme is proposed, and the parallel computational efficiency for the inverse dynamics of the basic three links of a PUMA 560 robot is analyzed. Compared with other algorithms, the theoretical computation cost of this parallel algorithm, in which factors such as communications overhead are ignored, is smaller 相似文献
14.
15.
对三维抛物型方程带Dirichlet边界条件初边值问题的离散系统使用块三对角可扩展并行算法求解。提出了反映差分格式内在并行性的概念——差分格式的并行度,讨论了差分格式的并行度与并行算法性能的关系。使用此方法在上海大学超级计算机"自强3000"上进行了数值实验,实验的结果与理论分析一致。在保证精度的前提下,得到线性加速比,并行效率达到90%以上。 相似文献
16.
17.
二维Poisson方程边值问题的块三对角可扩展并行算法 总被引:4,自引:2,他引:2
对二维Poisson方程带Dirichlet边界条件边值问题的离散系统使用块三对角可扩展并行算法求解.提出了反映差分格式内在并行性的概念——差分格式的并行度,讨论了差分格式的并行度与并行算法性能的关系.使用此方法在上海大学超级计算机"自强3000"上进行了数值实验,实验的结果与理论分析一致.在保证精度的前提下,得到线性加速比,并行效率达到90%以上. 相似文献
18.
19.
AVS游程解码、反扫描、反量化和反变换优化设计 总被引:5,自引:0,他引:5
提出了一种适用于AVS的游程解码、反扫描、反量化和反变换硬件结构优化设计方案。根据AVS整数变换和量化的特性,设计了可工作在不同模式的存储器阵列,既可用来进行反变换器所需的转置操作,又可用来存储中间结果,将游程解码、反扫描和反量化合并为一个流水线单元并行处理。该设计省去了存储中间结果所需的大量存储器,加快了处理速度,满足高清视频的处理要求。该设计通过了FPGA验证,综合结果表明,其逻辑门数仅为9076,最高工作频率大于200MHz。 相似文献