共查询到10条相似文献,搜索用时 15 毫秒
1.
LIU LeiBo CHEN YingJie WANG Dong YIN ShouYi WANG Xing WANG Long LEI Hao CAO Peng WEI ShaoJun 《中国科学:信息科学(英文版)》2014,(8):208-221
This paper proposes a task-based hybrid parallel and hybrid pipeline(THPHP)scheme to implement multi-standard video algorithms,including MPEG-2,H.264,and audio video coding standard(AVS),on a heterogeneous coarse-grained reconfigurable processor,called the reconfigurable multimedia system(REMUS).The proposed schemes greatly improve decoding performance and satisfy the real-time requirements of various high-definition(HD)video decoding standards.In THPHP,we propose both a task-based hybrid parallel scheme,in which macro-block(MB)-level,block-level,and sub-block-level decoding tasks are parallelized to improve data processing throughput,and a hybrid pipeline scheme,in which slice-level,MB-level,block-level and sub-block-level computations are pipelined to improve efficiency.Computation-intensive tasks,such as motion compensation,intra prediction,inverse discrete cosine transform,reconstruction,and deblocking filter,are implemented on two reconfigurable processing units,which are the core computing engines of REMUS.Thanks to the proposed schemes,the implementations can achieve H.264 high profile(HP)1920×1080@30 fps streams,AVS Jizhun profile(JP)1920×1080@39 fps streams,and MPEG-2 main profile(MP)1920×1080@41 fps streams when working at 200 MHz frequency.Compared with XPP-III(a commercial reconfigurable processor),when implementing H.264 HD decoding,the performance and energy efficiency on REMUS are improved by1.81×and 14.3×,respectively. 相似文献
2.
硬件的强大处理能力及软件的灵活性和可编程性,使得视频解码芯片的结构从硬件转向软硬件分区结构.作为新兴的标准,AVS视频标准对解码器的软硬件分区结构提出新的挑战.从AVS视频标准算法和实现复杂度入手,提出一种AVS高清视频解码器软硬件分区结构,实现满足基准档次6.0级别的AVS高清视频码流的实时解码,支持灵活的音视频同步、错误恢复、缓冲区管理和系统控制机制.已经在AVS101芯片上实现,硬件采用7阶宏块级同步流水,软件任务在RISC处理器上实现,可以在148.5MHz工作频率下对NTSC,PAL,720p(60f/s),直至1080i(60field/s)节目的实时解码显示. 相似文献
3.
周密 《数字社区&智能家居》2009,(3)
视频解码芯片的结构因硬件强大的处理能力和软件灵活的可编程功能从硬件转向软硬件分区结构。该文针对AVS标准的算法和解码实现复杂程度,根据软硬件协同设计思想提出了一种结构划分合理的AVS高清视频解码器软硬件分区结构。根据AVS算法的特点该结构将宏块层以上部分的元素解析划归到软件解码中,将宏块层解码划为硬件处理。经验证,该结构设计可实现AVS高清码流解码,并在C语言编写的硬件平台仿真程序中得以实现。 相似文献
4.
5.
在考虑动态部分重构及重构延时等特征的基础上,采用遗传算法及其与爬山算法的融合实现可重构系统软硬件任务的划分,并采用动态优先级调度算法进行划分结果的评价。实验表明,在可重构系统的资源约束等条件下,算法能够有效地实现应用任务图到可重构系统的时空映射。 相似文献
6.
异构片上系统(System-on-Chip,SoC)在同一芯片上集成了多种类型的处理器,在处理能力、尺寸、重量、功耗等各方面有较大优势,因此在很多领域得到了应用。具有动态部分可重构特性的SoC(Dynamic Partial Reconfigurability SoC,DPR-SoC)是异构SoC的一种重要类型,这种系统兼具了软件的灵活性和硬件的高效性。此类系统的设计通常涉及到软硬件协同问题,其中如何进行应用的软硬件划分是保证系统实时性的关键技术。DPR-SoC中的软硬件划分问题可归类为组合优化问题,问题目标是获得调度长度最短的调度方案,包括任务映射、排序和定时。混合整数线性规划(Mixed Integer Linear Programming,MILP)是求解组合优化问题的一种有效方法;然而,将具体问题建模为MILP模型是求解问题的关键一环,不同建模方式对问题求解时间有重要影响。已有针对DPR-SoC软硬件划分问题的MILP模型存在大量变量和约束方程,对问题求解时间产生了不利影响;此外,其假设条件过多,使得求解结果与实际应用不符。针对这些问题,提出了一种新颖的MILP模型,其极大地降低了模型复杂度,提高了求解结果与实际应用的符合度。将应用建模成DAG图,并使用整数线性规划求解工具对问题进行求解。大量求解结果表明,新的模型能够有效地降低模型复杂度,缩短求解时间;并且随着问题规模的增大,所提模型在求解时间上的优势表现得更加显著。 相似文献
7.
8.
Radha Guha Author Vitae Nader Bagherzadeh Author Vitae Author Vitae 《Computers & Electrical Engineering》2009,35(2):258-285
There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications.In this paper a modular and block based hardware configuration architecture named memory-aware run-time reconfigurable embedded system (MARTRES) is proposed for efficient resource management and performance improvement of streaming applications. Subsequently we design a task placement algorithm named hierarchical best fit ascending (HBFA) algorithm to prove that MARTRES configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings. The time complexity of HBFA algorithm is reduced to O(n) compared to traditional Best Fit (BF) algorithm’s time complexity of O(n2), when the quality of the placement solution by HBFA is better than that of BF algorithm. Finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement-aware partitioning and scheduling algorithm (BPASA). In BPASA we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware, while keeping in mind the required throughput of the output data. We balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs. the data transfer cost. The scheduler refers to the HBFA placement algorithm to check whether contiguous area on FPGA is available before scheduling the task for HW or for SW. 相似文献
9.
10.