首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 548 毫秒
1.
王永文  张民选 《计算机学报》2004,27(10):1320-1327
基于Itanium2微处理器体系结构提出单时钟和多时钟域两种基准模型;对处理器的电路级特性进行微体系结构级抽象,建立了参数化的峰值功耗估算模型;提出事件调度算法,实现了多时钟域处理器系统的行为级模拟;以IMPACT工具集作为模拟引擎实现了处理器的动态功耗模拟模型.与其它同类模型Wattch相比,该模型能够支持多时钟系统的模拟,峰值功耗估算精度高了约3%,而模拟速度提高了42%.通过实验说明了多时钟域的功耗特性,在一种多电压和频率环境下,多时钟域处理器的功耗和能量分别降低了21%和38%.该模型可以很好地应用到体系结构级低功耗研究设计.  相似文献   

2.
文中在分析MVFAST算法的基础上,提出了对MVFAST算法搜索窗的改进.通过对搜索窗的改进,在不影响图像质量的前提下,提高了算法的搜索速度.设计了支持该算法的体系结构和处理单元.给出了灵活的数据流结构和搜索策略.对大量不同的视频序列、分辨率不同的视频序列、运动剧烈程度不同的视频序列进行了实验.大量的实验结果表明,设计的体系结构的处理速度得到了明显的提高,内存访问带宽大大减小,获得了与全搜索算法可比的图像质量.  相似文献   

3.
文中在分析MVFAST算法的基础上,提出了对MVFAST算法搜索窗的改进。通过对搜索窗的改进,在不影响图像质量的前提下,提高了算法的搜索速度。设计了支持该算法的体系结构和处理单元。给出了灵活的数据流结构和搜索策略。对大量不同的视频序列、分辨率不同的视频序列、运动剧烈程度不同的视频序列进行了实验。大量的实验结果表明,设计的体系结构的处理速度得到了明显的提高,内存访问带宽大大减小,获得了与全搜索算法可比的图像质量。  相似文献   

4.
叶霜霜  申闫春 《计算机工程》2012,38(20):256-259
为能在移动设备资源限制的情况下提供高效高质的地理信息系统(GIS)服务,分析研究移动GIS引擎的系统体系结构和类库结构.采用A*算法解决移动GIS最优(短)路径搜索的问题.通过对引擎内部层次结构的设计,结合手机GPS定位和电子地图等关键技术,在Windows移动平台上用VC++语言开发并实现一个移动GIS引擎系统.研究结果表明该引擎系统界面整洁,功能良好,且A*算法的搜索效率提高15%~20%.  相似文献   

5.
以图计算为代表的数据密集型应用获得越来越广泛的关注,而传统的高性能计算机处理这类应用的效率较低.面向未来高性能计算机体系结构要有效支持数据密集型计算,深入研究以广度优先搜索(breadth-first search,BFS)算法为代表的图计算的典型特征,设计实现轻量级启发式切换BFS算法,该算法通过基本搜索方式的自动切换,避免冗余内存访问,提高搜索效率;针对BFS算法的离散随机数据访问特征以及众核处理器执行机制,建立面向BFS算法的众核处理器体系结构分析模型;全面、深入研究了BFS算法在典型众核处理器上的运行特征和性能变化趋势.测试结果表明:Cache命中率、内存带宽、流水线利用效率等相关参数均处于较低水平,无法完全满足BFS算法的需求,因此需要能够支持大量离散随机访问和简单执行机制的新型众核处理器体系结构.  相似文献   

6.
目标检测任务通常使用非极大值抑制算法(NMS)删除卷积神经网络输出的冗余候选框。Soft-NMS使用逐步衰减候选框得分值的方法代替Hard-NMS中直接删除大于预定义阈值候选框的方法,可以避免误删图像中重叠的目标候选框,提高目标检测任务的准确率。但是,频繁地改变候选框得分值使得Soft-NMS较Hard-NMS更为复杂,为了实现高准确率、低延时、低功耗的候选框去冗余效果,提出一种基于Soft-NMS的体系结构,利用对数函数优化复杂的浮点计算,细粒度流水和粗粒度并行组成2级优化结构进一步提升算法的吞吐率。在XILINX KU-115 FPGA开发板上对该体系结构进行了评估,评估结果表明,该体系结构的功耗为6.107 W,处理992个候选框的延时为168.95 μs,与CPU实现的Soft-NMS相比,该体系结构实现了36倍的性能提升,性能功耗比为CPU实现的264倍。  相似文献   

7.
针对桥梁预警系统群体间协同工作对信息共享的需求,提出了基于分布式管理引擎的预警系统数据库设计方法,其中包括数据库集成的体系结构,分布式管理引擎的命令分解算法、单元模板、单元拼装模板、模型重构算法。最后通过模拟算例对分布式管理引擎的可实施性和有效性进行了验证。  相似文献   

8.
通过对国内外可重构计算机体系结构的分析与研究,根据嵌入式多媒体信息处理计算量大、低功耗等特点,提出了一种面向嵌入式多媒体处理的可重构计算机体系结构模型并对该模型进行了宏体系结构仿真.仿真采用面向对象的设计思想,将可重构计算机中各种功能部件与软件实现中的对象直接对应,功能部件之间的关系用对象之间的关系来描述.最后通过对一组数字图像采用不同的处理算法进行仿真计算并得到预期正确效果,验证了该体系结构模型具有正确性.  相似文献   

9.
提出了一种分层算法的VLSI结构设计方法,该结构使用同一个基本搜索单元来完成整个三层搜索过程,减少了芯片的尺寸和功耗,同时,支持低比特视频编码器(如H.263和MPEG-4)的高级预测模式,在不增加额外计算负担的情况下,通过对数据流的有效控制,在获得宏块运动矢量的同时,可以获得该宏块4个8x8子块的运动矢量。  相似文献   

10.
为了降低全搜索运动估计算法带来的巨大计算量,提高运动估计计算速度,提出了一种新型的用于全搜索运动估计硬件结构。该硬件结构能实时地通过全搜索运动估计来搜索每个像素块最佳的匹配运动向量,并通过改进搜索窗缓存,获得了较高的运算速度增益并有效地降低了电路功耗。  相似文献   

11.
基于块匹配算法的运动估计是图像和视频应用中的关键技术。SAD运算是运动估计中最主要的运算形式,具有极高的计算复杂度和传输带宽需求。本文提出了一种可配置的SAD运算加速器结构,采用一个16×1规模的PE阵列和一个加法树结构加速SAD运算的执行。本文将PE阵列和加法树结构的流水线进行细致划分,有效提高了工作频率。加速器采用DMA事件机制,大部分的数据传输可以与SAD计算并行进行,减少了数据传输延迟引起的性能下降。实验结果显示,搜索16×16大小的搜索窗口,本文结构只需要4102个周期。基于SMIC0.13μm的CMOS标准单元工艺对本文结构进行综合,最高工作频率可达到750MHz,面积约为16.8k门和3.5KB的片上存储器。  相似文献   

12.
Visual sensor networks require low power compression techniques of large amount of video data in each camera node due to the energy-constrained and bandwidth-limited environments. In this paper, energy-efficient architecture for Variable Block Size Motion Estimation is proposed to fully utilize dynamic partial reconfiguration capability of programmable hardware fabric in distributed embedded vision processing nodes. Partial reconfiguration of FPGA is exploited to support run-time reconfiguration of the proposed modular hardware architecture for motion estimation. According to the required search range, hardware reconfiguration is performed adaptively to reduce the hardware resources and power consumption. A reconfigurable ME ranging from simple 1-D to a complex 2-D Sum of Absolute Differences (SAD) array to perform full search block matching is selected in order to support different search window size. The implemented scalable SAD array can provide different resolutions and frame rates for real time applications with multiple reconfigurable regions.  相似文献   

13.
Motion estimation is the computationally intensive part of video encoding. This paper presents a processing element based architecture for accelerating the calculation of Sum of Absolute Differences (SAD) which is the most widely used block matching criteria in motion estimation. A clock gating method is anlysed to enable or disable the required processing elements for a particular time of use. The selection of processing elements is performed based on motion analysis of the input video. The level of motion is estimated from initial frames to configure the hardware for SAD evaluation. A System-on-Chip approach, implemented in Xilinx Zynq SoC is proposed that will be efficient in terms of power and resource utilization as the hardware is configured based on the property of input video. This hardware-software co-design is able to achieve approximately 4.6x speed up compared to the the original software implementation of the framework running on ARM processor.  相似文献   

14.
为实现隔行扫描到逐行扫描的视频格式转换,提出了一种运动自适应的去隔行算法,主要包括运动估计、运动向量的形态学滤波、小角度边缘搜索、时空插值权重自适应插值等。该算法通过同极性场的像素块绝对值差和(SAD)与运动阈值的比较实现运动估计,并对运动向量进行形态学滤波处理,消除噪声影响。在小角度边缘搜索中采用自适应搜索半径和并行搜索树的策略实现最小6°的检测精度。最后,通过时空权重自适应的插值算法实现去隔行处理,取得了很好的处理效果。  相似文献   

15.
Integer motion estimation (IME), which acts as a key component in video encoder, is to remove temporal redundancies by searching the best integer motion vectors for dynamic partition blocks in a macro-block (MB). Huge memory bandwidth requirements and unbearable computational resource demanding are two key bottlenecks in IME engine design, especially for large search window (SW) cases. In this paper, a three-level pipelined VLSI architecture design is proposed, where efficiently integrates the reference data sharing search (RDSS) into multi-resolution motion estimation algorithm (MMEA). First, a hardware-friendly MMEA algorithm is mapped into three-level pipelined architecture with neglected coding quality loss. Second, sub-sampled RDSS coupled with Level C?+?are adopted to reduce on-chip memory and bandwidth at the coarsest and middle level. Data sharing between IME and fractional motion estimation (FME) is achieved by loading only a local predictive SW at the finest level. Finally, the three levels are parallelized and pipelined to guarantee the gradual refinement of MMEA and the hardware utilization. Experimental results show that the proposed architecture can reach a good balance among complexity, on-chip memory, bandwidth, and the data flow regularity. Only 320 processing elements (PE) within 550 cycles are required for IME search, where the SW is set to 256?×?256. Our architecture can achieve 1080P@30 fps real-time processing at the working frequency of 134.6 MHz, with 135 K gates and 8.93 KB on-chip memory.  相似文献   

16.
Motion estimation is one of the major problems in developing video coding applications. Among all motion estimation approaches, block-matching (BM) algorithms are the most popular methods due to their effectiveness and simplicity for both software and hardware implementations. A BM approach assumes that the movement of pixels within a defined region of the current frame (macro block, MB) can be modeled as a translation of pixels contained in the previous frame. In this procedure, the motion vector is obtained by minimizing the sum of absolute differences (SAD) produced by the MB of the current frame over a determined search window from the previous frame. The SAD evaluation is computationally expensive and represents the most consuming operation in the BM process. The most straightforward BM method is the full search algorithm (FSA), which finds the most accurate motion vector, exhaustively calculating the SAD values for all the elements of the search window. Over this decade, several fast BM algorithms have been proposed to reduce the number of SAD operations by calculating only a fixed subset of search locations at the cost of poor accuracy. In this paper, a new algorithm based on differential evolution (DE) is proposed to reduce the number of search locations in the BM process. To avoid computing several search locations, the algorithm estimates the SAD values (fitness) for some locations using the SAD values of previously calculated neighboring positions. As the proposed algorithm does not consider any fixed search pattern or any other different assumption, a high probability for finding the true minimum (accurate motion vector) is expected. In comparison with other fast BM algorithms, the proposed method deploys more accurate motion vectors, yet delivering competitive time rates.  相似文献   

17.
刘艳 《图学学报》2015,36(4):576
为了提高视频的压缩效率,在传统菱形搜索算法基础上提出一种改进菱形搜索算 法。该算法通过引入动态阈值,在起始搜索点预测、菱形搜索模式和搜索中止算法方面进行了 优化,减少了SAD 计算的内部冗余和搜索区域中不相关的块匹配计算,同时采用自适应搜索模 式选择技术减少运输复杂度。实验结果表明:提出的改进菱形搜索算法适合各种运动类型的视 频序列,特别适用于运动变化剧烈的序列,相比于FS 算法,能够在PSNR 值和码率值极其接 近于FS 算法的情况下对所有序列的MET 减少约95%,大大减少运动估计时间。  相似文献   

18.
Block matching (BM) motion estimation plays a very important role in video coding. In a BM approach, image frames in a video sequence are divided into blocks. For each block in the current frame, the best matching block is identified inside a region of the previous frame, aiming to minimize the sum of absolute differences (SAD). Unfortunately, the SAD evaluation is computationally expensive and represents the most consuming operation in the BM process. Therefore, BM motion estimation can be approached as an optimization problem, where the goal is to find the best matching block within a search space. The simplest available BM method is the full search algorithm (FSA) which finds the most accurate motion vector through an exhaustive computation of SAD values for all elements of the search window. Recently, several fast BM algorithms have been proposed to reduce the number of SAD operations by calculating only a fixed subset of search locations at the price of poor accuracy. In this paper, a new algorithm based on Artificial Bee Colony (ABC) optimization is proposed to reduce the number of search locations in the BM process. In our algorithm, the computation of search locations is drastically reduced by considering a fitness calculation strategy which indicates when it is feasible to calculate or only estimate new search locations. Since the proposed algorithm does not consider any fixed search pattern or any other movement assumption as most of other BM approaches do, a high probability for finding the true minimum (accurate motion vector) is expected. Conducted simulations show that the proposed method achieves the best balance over other fast BM algorithms, in terms of both estimation accuracy and computational cost.  相似文献   

19.
基于DCT的频域块匹配法及其实现   总被引:2,自引:0,他引:2  
曹宁  吴敏 《计算机工程》2000,26(11):129-130
用软件实现在低码率信道上的实时视频编解码是图像压缩领域的研究热点,该文就此提出了一种基于离散余弦变换(DCT)系数的频域块匹配法,通过引入阈值矩阵,降低了运动估值算法的运算量,提高了搜索效率。计算机仿真表明,该算法可在PC机上用软件实时实现,具有较好的实用价值。  相似文献   

20.
用软件实现在低码率信道上的实时视频编解码是图象压缩领域的研究热点,本文就此提出了一种基于离散余弦变换(DCT)系数的频域块匹配法,通过引入阈值矩阵,降低了运动估值的运算量,提高了搜索效率。计算机仿真表明,该算法可在PC机上用软件实时实现,具有较好的实用价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号