期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

龚浩戚其丰《控制工程》2005,(Z1)

Intel处理器的SSE2(Streaming SIMD Extensions 2,数据流单指令多数据扩展)技术,支持指令级SIMD操作,提供了单处理器上并行处理的解决方法。将模板匹配算法用SSE2 技术并行化,在Linux平台下用GCC编译实现。试验结果表明:SSE2技术大大加快了模板匹配的速度,能够在保证原有精度和稳定性基础上,解决了模板匹配方法计算量大、耗时多, 成本高的问题,有效地满足了在电子产品与制造等众多领域对计算机视觉技术的实时要求。相似文献

2.

一种基于奔腾SIMD指令的快速背景提取方法 总被引：3，自引：0，他引：3

周西汉刘勃周荷琴袁非牛《计算机工程与应用》2004,40(27):81-83

论文提出一种基于Intel奔腾SIMD指令的快速背景提取方法。在一种改进的混合高斯背景模型中,Jeffrey值的计算和背景模型的更新等存在着很高的内在SIMD并行性,通过将数据按照SSE数据类型组织,实现了混合高斯背景模型的SIMD算法。实验结果表明:嵌入奔腾SIMD指令的方法比传统计算提高75%左右的性能,加速了背景提取的速度,达到了实时处理的要求,具有较大的实际应用价值。相似文献

3.

基于Intel SIMD指令的二维FFT优化算法 总被引：1，自引：0，他引：1

李成军周卫峰朱重光《计算机工程与应用》2007,43(5):41-44

在基于频域的大数据量图像处理算法中,最为耗时的步骤就是对图像数据进行二维FFT变换的过程。论文针对这一问题,提出一种基于Intel SIMD指令的二维FFT优化算法。通过将数据按照便于SIMD指令计算的方式进行组织,利用SSE3指令加速复数乘法,在二维处理中针对处理器缓存进行优化等方法,实现了很高的性能。实验结果表明:描述的算法比目前使用最广泛的公共域FFT程序包FFTW快30%左右。达到了对大数据量图像进行快速处理的要求,具有较大的工程实用价值。相似文献

4.

基于SSE2指令集的RC6 64/r/b在IA 32平台上的快速实现

陈佳康李晖王臖邓冠阳《计算机应用与软件》2012,(10):85-88,108

目前64位与32位计算机广泛共存,RC6 64/r/b的常规实现方法在64位计算机上可以达到很高的性能,但在32位计算机上性能较低,这限制了RC6 64/r/b算法的广泛应用.利用SSE2指令集对RC6 64/r/b算法中的64位运算进行简化,并实现SIMD并行,使得RC6 64/r/b在IA 32平台上的运行速度成倍提升,该方法也可用于其他含有64位运算的密码算法的快速实现上. 相似文献

5.

全搜索算法的SSE并行优化

陶志强徐萌徐荣飞《微计算机应用》2011,32(11)

在基于宏块划分的视频编码算法中,运动估计阶段因为其庞大的计算量占用了绝大多数的编码时间.特别是在对高清视频进行编码时,运动估计已经成为提升编码性能的最大瓶颈.本文通过对全搜索运动估计算法进行基于像素的并行化修改和优化,使用SSE指令调用CPU的SIMD单元同时对当前宏块与参考宏块的多个像素进行SAD运算,对运动估计进行了并行化的实现.在相同的硬件环境以及保证编码质量的前提下,相对于传统的全搜索CPU运算获得了2倍以上的编码性能提升. 相似文献

6.

CT图像SART重建技术的CUDA并行实现

史怀林孙丰荣姜威刘炜秦通李新彩《计算机应用》2011,31(5):1245-1248

在计算机断层扫描(CT)图像重建领域,当投影数据不完备或者含有噪声时,相对于滤波反投影(FBP)算法,联合代数重建方法(SART)能重建出质量更高、更符合临床诊断要求的图像。但SART方法非常耗时,而算法的并行实现是解决这一问题的有效途径之一。提出一种基于nVIDIA通用设备计算架构(CUDA)实现的SART并行运算方法。实验结果表明,该方法在不牺牲重建图像质量的基础上,重建时间大为缩减,更有利于临床应用。相似文献

7.

Linux平台下基于SIMD编程的模板匹配优化

陈辉龚浩张燕忠《计算机测量与控制》2004,12(12):1222-1225

模板匹配是进行滤波、边缘检测、目标识别和图像匹配的一种基本和有效的方法。但是模板匹配是一种密集型运算，在单处理机上实现耗时较多，但是如采用并行阵列计算机，硬软件成本也会相应提高。所幸Intel处理器提供了MMX／SSE／SSE2指令集，支持指令级SIMD操作。可将模板匹配主要运算部分进行SIMD并行化，在Linux平台下编程实现单处理机上的并行处理。测试结果表明：SIMD大大加快了模板匹配的速度。相似文献

8.

CT图像重建滤波反投影析算法的精度研究

骆岩红《自动化与仪器仪表》2013,(6):27-29

CT是常被用于医学和工业领域的计算机断层成像技术,这是一种优质无损诊断技术,即利用投影数据重建物体断层图像。对于精密部件的检测在图像重建无损检测应用中需要有更高的精度要求,CT应用中期待解决的关键问题是怎么样能直接重建出满足工程目标的CT图像。本文就系统地深入地研究了解析式图像重建算法的精度。在重建算法中滤波反投影（FBP）算法是解析法的根本,所以本文以该算法的实现,利用定积分数值计算方法,根据反投影的特点,提出了基于辛普生公式的精确反投影方法。相似文献

9.

一种基于POCS约束的图像代数重建算法 总被引：1，自引：0，他引：1

胡小舟孔斌成二康胡戎翔《模式识别与人工智能》2009,22(5)

不完全投影数据的代数重建问题一直是CT应用中的热点问题.通过对相互垂直角度投影图像之间的关系分析,文中提出一种改进的代数重建(ART)算法.该算法采用记录射线穿过网格编号和射线与网格相交长度的方法计算投影系数矩阵,并在反投影过程中对不完全投影数据采用凸集投影约束的方法进行重建.实验表明该算法与ART算法相比,图像重建的速度与图像重建的质量都得到较大提高. 相似文献

10.

锥束CT感兴趣区域图像重建研究

张顺利张定华程云勇李小林《计算机应用研究》2012,29(9):3521-3524

针对锥束CT感兴趣区域扫描中存在的截断投影数据图像重建问题,提出用基于迭代的代数重建(ART)算法进行重建。锥束ART算法的缺点是计算量大、重建速度慢。为了提高该算法的重建速度,提出了一种基于多核平台的快速并行图像重建方法。首先将三维重建区域等分为上下两块,相应地,探测器平面也分为上下两部分;然后通过双线性插值计算虚拟探测器投影数据;最后通过多线程技术在多核平台上实现了ART算法的并行重建,在保持较高重建精度的同时取得了约两倍的重建加速比。在此基础上,通过仿真实验对3DShepp-Logan模型不同感兴趣区域进行了重建,实验结果表明,ART算法用于感兴趣区域图像重建是可行的。相似文献

11.

Optimizing Gaussian filtering of volumetric data using SSE

A. Va&#x;ko M. &#x;rmek 《Concurrency and Computation》2011,23(1):100-116

Gaussian filtering is a basic operation commonly used in numerous image and volume processing algorithms. It is, therefore, desirable to perform it as efficiently as possible. Over the last decade CPUs have been successfully extended with several SIMD (Single Instruction Multiple Data) extensions, such as MMX, 3DNow!, and SSE series. In this paper we introduce a new technique for Gaussian filtering of volume data sets—the extended volume—together with its SIMD implementation using the SSE technology. We further introduce a SIMD optimized recursive IIR implementation of the Gaussian filter, and finally, we parallelize the SSE versions with the help of OpenMP (Open Multi‐Processing). Experimental evaluation indicates that the SIMD implementation can significantly speed up both versions of the Gaussian filtering and that the non‐recursive extended volume version is faster than the recursive IIR one for small widths of the Gaussian filter. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

12.

Embedded GPU and multicore processors for emotional-based mobile robotic agents

《Future Generation Computer Systems》2016

Control architectures based on emotions are becoming promising solutions for the implementation of future robotic systems. The basic controllers of this architecture are the emotional processes that decide which behaviors the robot must activate to fulfill the objectives. The number of emotional processes increases (hundreds of millions/s) with the complexity level of the application, limiting the processing capacity of a main processor to solve the complex problems. Fortunately, the potential parallelism of emotional processes permits their execution in parallel, hence enabling the computing power to tackle the complex dynamic problems. In this paper, Graphic Processing Unit (GPU), multicore processors and single instruction multiple data (SIMD) instructions are used to provide parallelism for the emotional processes. Different GPUs, multicore processors and SIMD instruction sets are evaluated and compared to analyze their suitability to cope with robotic applications. The applications are set-up taking into account different environmental conditions, robot dynamics and emotional states. Experimental results show that, despite the fact that GPUs have a bottleneck in the data transmission between the host and the device, the evaluated GTX 670 GPU provides a performance of more than one order of magnitude greater than the initial implementation of the architecture on a single core. Thus, all complex proposed application problems can be solved using the GPU technology in contrast to the first prototype where only 55% of them could be solved. Using AVX SIMD instructions, the performance of the architecture is increased in 3.25 times in relation to the first implementation. Thus, from the 27 proposed applications about 88.8% are solved. In the case of the SSE SIMD instructions, the performance is almost doubled and the robot could solve about 74% of the proposed application problems. The use of AVX and SSE SIMD instructions provides almost the same performance as a quad- and a dual-core, respectively, with the advantage that they do not add any additional hardware cost. 相似文献

13.

A fast, streaming SIMD Extensions 2, logistic squashing function

Milner JJ Grandison AJ 《Neural computation》2008,20(12):2967-2972

Schraudolph proposed an excellent exponential approximation providing increased performance particularly suited to the logistic squashing function used within many neural networking applications. This note applies Intel's streaming SIMD Extensions 2 (SSE2), where SIMD is single instruction multiple data, of the Pentium IV class processor to Schraudolph's technique, further increasing the performance of the logistic squashing function. It was found that the calculation of the new 32-bit SSE2 logistic squashing function described here was up to 38 times faster than the conventional exponential function and up to 16 times faster than a Schraudolph-style 32-bit method on an Intel Pentium D 3.6 GHz CPU. 相似文献

14.

使用LSA降维的改进ART2神经网络文本聚类

徐晨凯高茂庭《计算机工程与应用》2014,(24):133-138,177

针对文本数据高维度的特点和聚类的动态性要求,结合隐含语义分析（LSA）降维,提出一种改进的ART2神经网络文本聚类算法,通过LSA凸显文本和词条之间的语义关系,减少无用噪声,降低数据维度和计算复杂性;采用改进的折中学习方法,减少计算步骤,加快ART2神经网络计算速度,并利用最近邻动态重组方法提高ART2网络聚类的稳定性,减弱算法对样本输入顺序的依赖。实验表明,改进的文本聚类算法能有效地实现动态文本聚类。相似文献

15.

一种改进的控制流SIMD向量化方法

高伟李颖颖孙回回李雁冰赵荣彩《软件学报》2017,28(8):2046-2063

SIMD扩展部件是近年来集成到通用处理器中的加速部件,旨在发掘多媒体和科学计算等程序的数据级并行.控制依赖给发掘程序中的数据级并行带来了阻碍,当前不论基于loop-based还是SLP的控制流向量化方法都需要if转换,而没有考虑循环内蕴含的向量并行度,导致生成的向量代码效率较低.此外不精确的代价模型指导控制流向量化,同样导致生成的向量代码效率较低.为此提出了改进的控制流SIMD向量化方法,首先提出了含有控制依赖的循环分布算法,分离循环的可向量化部分和不可向量化部分,同时考虑分布时数据的局部性;其次提出了一种直接向量化控制流的方法,该方法考虑了基本块间的向量重用;最后利用精确的代价模型指导超字选择指令和超字条件分支指令的生成.实验结果表明,与现有的控制流向量化方法相比,本文提出的改进方法生成的向量代码性能提高24%. 相似文献

16.

Wireless smart camera network for real-time human 3D pose reconstruction

Zoran Zivkovic 《Computer Vision and Image Understanding》2010,114(11):1215-1222

A multiple-camera system for 3D pose reconstruction is presented. First, body parts of the user are detected. Each camera has a single-instruction multiple-data (SIMD) processor used to perform this heavy-load image processing task. The detected hand and head candidate positions are then transmitted wirelessly from each camera to a central processor using a low-power ZigBee network. Finally, the 3D pose reconstruction is performed at the central processor by combining the data in a probabilistic manner. 相似文献