首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
This paper presents architectures for supporting dynamic data scaling in pipeline fast Fourier transforms (FFTs), suitable when implementing large size FFTs in applications such as digital video broadcasting and digital holographic imaging. In a pipeline FFT, data is continuously streaming and must, hence, be scaled without stalling the dataflow. We propose a hybrid floating-point scheme with tailored exponent datapath, and a co-optimized architecture between hybrid floating point and block floating point (BFP) to reduce memory requirements for 2-D signal processing. The presented co-optimization generates a higher signal-to-quantization-noise ratio and requires less memory than for instance convergent BFP. A 2048-point pipeline FFT has been fabricated in a standard-CMOS process from AMI Semiconductor (Lenart and Owall, 2003), and a field-programmable gate array prototype integrating a 2-D FFT core in a larger design shows that the architecture is suitable for image reconstruction in digital holographic imaging  相似文献   

3.
在由通用RISC处理器核和附加定点硬件加速器构成的定点SoC(System-on-Chip)芯片体系架构基础上,提出了一种新颖的基于统计分析的定点硬件加速器字长设计方法。该方法利用统计参数在数学层面上求解计算出满足不同信噪比要求下的最小字长,能有效地降低芯片面积、功耗和制作成本,从而在没有DSP协处理器的低成本RISC处理器核SoC芯片上运行高计算复杂度应用。  相似文献   

4.
In image processing, pattern recognition, and computer vision, one of the most powerful techniques for feature extraction is to use moments. Real-time applications of this method, however, have been prohibited due to the intensive computation encountered in calculating the moments. One solution to this problem is to adopt specially designed hardware accelerators. This paper describes, from a practical standpoint, the design of a custom hardware accelerator for speeding up the moment computation. The design of the core functional units and the design of the overall system based on a wavefront array architecture are discussed. The moment accelerator can be easily configured into different sizes to meet diverse application requirements cost effectively. Testing results based on implementation using field-programmable gate array devices show that, at an affordable cost, the proposed hardware accelerator can deliver real-time speeds for moment computation. Elimination of this computational bottleneck makes it possible to use moments-based features in real-time industrial applications  相似文献   

5.
This paper presents a novel hardware implementation of a disparity estimation scheme targeted to real-time Integral Photography (IP) image and video sequence compression. The software developed for IP image compression achieves high quality ratios over classic methodologies by exploiting the inherent redundancy that is present in IP images. However, there are certain time constraints to the software approach that must be confronted in order to address real-time applications. Our main effort is to achieve real-time performance by implementing in hardware the most time-consuming parts of the compression algorithm. The proposed novel digital architecture features minimized memory read operations and extensive simultaneous processing, while taking into concern the memory and data bandwidth limitations of a single FPGA implementation. Our results demonstrate that the implemented hardware system can successfully process high resolution IP video sequences in real-time, addressing a vast range of applications, from mobile systems to demanding desktop displays.  相似文献   

6.
In this paper, we present a hardware architecture for real-time three-dimensional (3D) surface model reconstruction from Integral Images (InIms). The proposed parallel digital system realizes a number of computational-heavy calculations in order to achieve real-time operation. The processing elements are deployed in a systolic architecture and operate on multiple image areas simultaneously. Moreover, memory organization allows random access to image data and copes with the increased processing throughput of the system. Operating results reveal that the proposed architecture is able to process 3D data at a real-time rate. The proposed system can handle large sized InIms in real time and outputs 3D scenes of enhanced depth and detailed texture, which apply to emerging 3D applications.  相似文献   

7.
The method of moments is one of the most powerful techniques for image analysis. However, real-time applications of this method have been prohibited due to the computational intensity in calculating the moments. This paper presents a novel configurable hardware accelerator for expediting the moment computation. The fundamental building block of the proposed accelerator is a custom-designed floating-point moment processing element (MPE). Running at 75 MHz, the MPE can provide a 12X speedup over a 166 MHz TMS320C6701 digital signal processor. On top of this, a linear performance boost can be obtained by connecting up to eight MPEs into a one-dimensional (1-D) array  相似文献   

8.
针对数字全息重建算法计算速度慢、实时应用能力弱以及现有GPU加速策略跨平台移植性差等问题,该文提出一种利用开放运算语言(OpenCL)架构提高数字全息重建算法执行效率的方案。该方案充分利用OpenCL架构的异构协同计算能力,对数字全息卷积重建算法进行CPU+GPU的异构运行设计,并采用数据并行模式编程实现。针对不同分辨率数字全息图、不同GPU加速平台的测试结果表明,该加速策略的平均执行时间均比CPU低1个数量级,最高总加速比达到54.2,并行运算加速比甚至高达94.7,且具有规模增长性及良好的跨平台特性,加速效率显著,更加适用于数字全息技术的工程化实现及实时性应用场合。  相似文献   

9.
基于硅基液晶的空分复用彩色全息显示研究   总被引:2,自引:0,他引:2  
王岳  沈川  张成  刘凯峰  韦穗 《中国激光》2012,39(12):1209001-183
彩色全息显示是全息显示的一个重要研究目标。研究了使用RGB三色激光的彩色全息显示技术,提出基于空分复用的彩色全息显示方法。全息光电再现像的成像区域大小和成像区域中心位置依赖于RGB三色激光的波长,通过调节RGB三色分量原图大小以及加载数字闪耀光栅实现RGB三色再现图像分量区域大小和成像中心的重合。基于空分复用的方法建立了彩色全息显示系统,最终的彩色全息显示系统利用空间光调制器加载计算生成的24bit全息图再现彩色图像。实验结果验证了该方法的可行性。  相似文献   

10.
为了同时获得多个平面的数字全息显微再现图像以延拓成像空间深度,提出一种多平面数字全息显微成像法。将预先设定参数的二次扭曲位相因子作用于实验记录的数字全息图,只需一次菲涅耳重建便可同时获得多个成像平面的清晰再现图像。首先依据菲涅耳成像系统的传递函数,推导了采用二次扭曲位相因子的成像传递函数,确定参数频域滤波的选取规则;然后将实验得到的数字全息图像进行频域滤波以消除直透光和共轭像;最后将二次扭曲位相因子作用于滤波后的全息图进行菲涅耳重建。与其他方法相比较,本方法只需一次重建就能同时得到多个平面的聚焦像,且重建距离可以任意选择,再现图像不受直透光和共轭像干扰。  相似文献   

11.
Image scaling is a frequent operation in medical image processing. This paper presents how two-dimensional (2-D) image scaling can be accelerated with a new coarse-grained parallel processing method. The method is based on evenly divisible image sizes which is, in practice, the case with most medical images. In the proposed method, the image is divided into slices and all the slices are scaled in parallel. The complexity of the method is examined with two parallel architectures while considering memory consumption and data throughput. Several scaling functions can be handled with these generic architectures including linear, cubic B-spline, cubic, Lagrange, Gaussian, and sinc interpolations. Parallelism can be adjusted independent of the complexity of the computational units. The most promising architecture is implemented as a simulation model and the hardware resources as well as the performance are evaluated. All the significant resources are shown to be linearly proportional to the parallelization factor. With contemporary programmable logic, real-time scaling is achievable with large resolution 2-D images and a good quality interpolation. The proposed block-level scaling is also shown to increase software scaling performance over four times.  相似文献   

12.
AdaBoost算法的人脸检测系统的SoC软硬件设计   总被引:1,自引:0,他引:1  
AdaBoost人脸检测算法计算量大,难以在嵌入式平台上用纯软件实时实现.文中对AdaBoost检测算法进行了性能分析,设计了合适的软硬件划分方案.算法的大部分计算都转移到硬件加速器中,大大提高了检测的速度.文中描述了整个系统的周期精确模型.仿真显示,SoC方案的速度是纯软件的11倍,在200MHz的主频下可以以28帧/秒的速度检测384*288的图像.  相似文献   

13.
The computational power required in many multimedia applications is well beyond the capabilities of today's multimedia systems. Therefore, the embedding of additional high-performance accelerator multimedia components into these systems is most decisive. This paper presents the embedding of multimedia components into computer systems using reconfigurable coprocessor boards. The goal of those reconfigurable platforms which can be adapted to several applications and which include programmable digital signal processors, control and memory devices as well as dedicated multimedia ASICs is worked out. On the way to such a platform four ASICs for image and text processing are presented. The embedding of these components into a computing system using a CardBus-based coprocessor board is shown. Such a reconfigurable coprocessor board is an important intermediate stage on the way to future hybrid reconfigurable systems on chip.  相似文献   

14.
Fast Fourier transform algorithms on large data sets achieve poor performance on various platforms because of the inefficient strided memory access patterns. These inefficient access patterns need to be reshaped to achieve high performance implementations. In this paper we formally restructure 1D, 2D and 3D FFTs targeting a generic machine model with a two-level memory hierarchy requiring block data transfers, and derive memory access pattern efficient algorithms using custom block data layouts. These algorithms need to be carefully mapped to the targeted platform’s architecture, particularly the memory subsystem, to fully utilize performance and energy efficiency potentials. Using the Kronecker product formalism, we integrate our optimizations into Spiral framework and evaluate a family of DRAM-optimized FFT algorithms and their hardware implementation design space via automated techniques. In our evaluations, we demonstrate DRAM-optimized accelerator designs over a large tradeoff space given various problem (single/double precision 1D, 2D and 3D FFTs) and hardware platform (off-chip DRAM, 3D-stacked DRAM, ASIC, FPGA, etc.) parameters. We show that Spiral generated pareto optimal designs can achieve close to theoretical peak performance of the targeted platform offering 6x and 6.5x system performance and power efficiency improvements respectively over conventional row-column FFT algorithms.  相似文献   

15.
This paper presents the design of an embedded automated digital video surveillance system with real-time performance. Hardware accelerators for video segmentation, morphological operations, labeling and feature extraction are required to achieve the real-time performance while tracking will be handled in software in an embedded processor. By implementing a complete embedded system, bottlenecks in computational complexity and memory requirements can be identified and addressed. Accordingly, a memory reduction scheme for the video segmentation unit, reducing bandwidth with more than 70%, and a low complexity morphology architecture that only requires memory proportional to the input image width, have been developed. On a system level, it is shown that a labeling unit based on a contour tracing technique does not require unique labels, resulting in more than 50% memory reduction. The hardware accelerators provide the tracking software with image objects properties, i.e. features, thereby decoupling the tracking algorithm from the image stream. A prototype of the embedded system is running in real-time, 25 fps, on a field programmable gate array development board. Furthermore, the system scalability for higher image resolution is evaluated.  相似文献   

16.
ABSTRACT

Otsu’s global automatic image thresholding operation is used in various image processing applications. It needs computation of normalized cumulative histogram, mean and cumulative moments that are compute-intensive operations. In this paper, a custom architecture is presented for an efficient computation of Otsu’s algorithm along with its utilization as an intellectual property (IP) core in a field programmable gate array (FPGA) based system-on-chip (SoC) environment for the application of connected component analysis (CCA). A self-normalization technique is employed, where single-cycle, read–modify–write operations are performed with block random access memories (BRAMs) and digital signal processing (DSP)slices. The architecture is designed for 640 × 480 size of images that are captured by a high-resolution analouge camera and buffered in a DDR2 SDRAM of Xilinx ML-507 platform at 25.175 MHz clock frequency. The embedded PowerPC processor core is used to control the frame acquisition process. Experimental results on Virtex-5 xc5vfx70t FPGA device show that the architecture utilizes 1.4% slices, 2.7% BRAMs and 3.9% DSP48E slices. The total power consumption of the design is 1440.59 mW. The proposed architecture as an IP core is able to work in real-time with standard VGA resolution video and requires low computational resources.  相似文献   

17.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel  相似文献   

18.
赵亚迪  曹晓华  陈波  孙天齐 《红外与激光工程》2018,47(6):626002-0626002(5)
孔径综合是提高成像分辨率的有效手段。针对综合孔径数字全息术,提出一种亚像素位移精度的孔径综合方法。利用基于快速傅里叶变换的菲涅耳衍射公式进行目标光场重构,利用傅里叶变换的移位性质,将全息图的位移量转化为活塞相位和倾斜相位,从而实现任意位移量的数字全息孔径综合。开展了数字离轴全息综合孔径实验,实验结果验证了该方法的有效性。同时,在牺牲一定分辨率的前提下,该方法能够显著减少孔径综合的运算量,具有很好的实时性,并降低处理器的硬件配置要求。  相似文献   

19.
We present multifocus holographic 3-D image fusion based on independent component analysis (ICA). In this paper, the ICA technique is used to fuse multiple reconstructed holographic images at different distances from the image sensor. A hologram of two dice located at distances of 315 and 345 mm, respectively, from the sensor is recorded using phase-shifting digital holography and used in our experiments. The resulting reconstructed fused holographic image shows both dice objects clearly in focus. This is compared with a single reconstructed holographic image in which only one of the die objects is in focus at a particular reconstruction distance.  相似文献   

20.
The spatial resolution of a hyperspectral image is often coarse because of the limitations of the imaging hardware. Super-resolution reconstruction (SRR) is a promising signal post-processing technique for hyperspectral image resolution enhancement. This paper proposes a maximum a posteriori (MAP) based multi-frame super-resolution algorithm for hyperspectral images. Principal component analysis (PCA) is utilized in both parts of the proposed algorithm: motion estimation and image reconstruction. A simultaneous motion estimation method with the first few principal components, which contain most of the information of a hyperspectral image, is proposed to reduce computational load and improve motion field accuracy. In the image reconstruction part, different image resolution enhancement techniques are applied to different groups of components, to reduce computational load and simultaneously remove noise. The proposed algorithm is tested on both synthetic images and real image sequences. The experimental results and comparative analyses verify the effectiveness of this algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号