期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

System-Level Data-Flow Transformation Exploration and Power-Area Trade-offs Demonstrated on Video Codecs

Francky Catthoor Martin Janssen Lode Nachtergaele Hugo De Man 《The Journal of VLSI Signal Processing》1998,18(1):39-50

A VLSI architecture for the block matching motion estimation is described in this paper. The proposed architecture achieves 100% PE utilization and alleviates I/O bottleneck problem using small amount of distributed on-chip image memory. The number of processing elements is scalable according to the degree of parallel processing and throughput requirement. The overall computations are performed in pipelined manner and the data fill time for contiguous block is eliminated to increase throughput. The VLSI system implementation methodologies and the layouts are also described. Finally, the performances are evaluated and the advantages are outlined, compared to other architectures. 相似文献

2.

High Throughput,Scalable VLSI Architecture for Block Matching Motion Estimation

You Jaehee Lee Sang Uk 《Journal of Signal Processing Systems》1998,19(1):39-50

A VLSI architecture for the block matching motion estimation is described in this paper. The proposed architecture achieves 100% PE utilization and alleviates I/O bottleneck problem using small amount of distributed on-chip image memory. The number of processing elements is scalable according to the degree of parallel processing and throughput requirement. The overall computations are performed in pipelined manner and the data fill time for contiguous block is eliminated to increase throughput. The VLSI system implementation methodologies and the layouts are also described. Finally, the performances are evaluated and the advantages are outlined, compared to other architectures. 相似文献

3.

Parallel algorithms/architectures for neural networks 总被引：1，自引：0，他引：1

J. N. Hwang S. Y. Kung 《The Journal of VLSI Signal Processing》1989,1(3):221-251

This paper advocates digital VLSI architectures for implementing a wide variety of artificial neural networks (ANNs). A programmable systolic array is proposed, which maximizes the strength of VLSI in terms of intensive and pipelined computing and yet circumvents the limitation on communication. The array is meant to be more general purpose than most other ANN architectures proposed. It may be used for a variety of algorithms in both the retrieving and learning phases of ANNs: e.g., single layer feedback networks, competitive learning networks, and multilayer feed-forward networks. A unified approach to modeling of existing neural networks is proposed. This unified formulation leads to a basic structure for a universal simulation tool and neurocomputer architecture. Fault-tolerance approach and partitioning scheme for large or non-homogeneous networks are also proposed. Finally, the implementations based on commercially available VLSI chips (e.g., Inmos T800) and custom VLSI technology are discussed in great detail. 相似文献

4.

VLSI architectures for video compression-a survey 总被引：3，自引：0，他引：3

Pirsch P. Demassieux N. Gehrke W. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(2):220-246

The paper presents an overview on architectures for VLSI implementations of video compression schemes as specified by standardization committees of the ITU and ISO. VLSI implementation strategies are discussed and split into function specific and programmable architectures. As examples for the function oriented approach, alternative architectures for DCT and block matching will be evaluated. Also dedicated decoder chips are included Programmable video signal processors are classified and specified as homogeneous and heterogenous processor architectures. Architectures are presented for reported design examples from the literature. Heterogenous processors outperform homogeneous processors because of adaptation to the requirements of special, subtasks by dedicated modules. The majority of heterogenous processors incorporate dedicated modules for high performance subtasks of high regularity as DCT and block matching. By normalization to a fictive 1.0 μm CMOS process typical linear relationships between silicon area and through-put rate have been determined for the different architectural styles. This relationship indicates a figure of merit for silicon efficiency 相似文献

5.

Software of silicon? the designer's option

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1986,74(6):861-874

Traditionally, the bulk of computer system functionality is implemented in the software medium, as a sequence of instructions for a general-purpose processor. Historically, this has provided the best balance of flexibility, cost, and performance. The new economics of VLSI and continuing advances in VLSI CAD capability open the possibility of application-specific functionality embedded in silicon as a matter of routine. This paper presents several case studies of silicon solutions used in typical software areas, including regular language recognition, Ada program unit replacement, dictionary machines, and string pattern matching. Either software or hardware designers may benefit from a study of such architectures, and Organick's notion of heterosystems designers proficient in both domains is supported. 相似文献

6.

An Integrated Systolic Array Design for Video Compression

Pol Lin Tai Chii Tung Liu Jia Shung Wang 《The Journal of VLSI Signal Processing》2003,33(1-2):157-169

This paper presents an integrated systolic array design for implementing full-search block matching, 2-D discrete wavelet transform, and full-search vector quantization on the same VLSI architecture. These functions are the prime components in video compression and take a great amount of computation. To meet the real-time application requirements, many systolic array architectures are proposed for individually performing one of those functions. However, these functions contain similar computational procedure. The matrix-vector product forms of the three functions are quite analogous. After extracting the common computation component, we design an integrated one-dimensional systolic array that can perform aforementioned three functions. The proposed architecture can efficiently perform three typical functions: (1) the full-search block matching with block of size 16 × 16 and the search are from –8 to 7; (2) the 2-D 2 level Harr transform with block of size 8 × 8; and (3) the full-search vector quantization with input vector of size 2 × 2. A utilization rate of 100% to 97% is achieved in the course of executing full-search block matching and full-search vector quantization. When it comes to perform 2-D discrete wavelet transform, the utilization rate is about 32%. The proposed integrated architecture has lowered hardware cost and reduced hardware structure. It befits the VLSI implementation for video/image compression applications. 相似文献

7.

2D DWT VLSI architecture for wavelet image processing

Seung-Kwon Pack Lee-Sup Kim 《Electronics letters》1998,34(6):537-538

A cost-effective VLSI architecture with separate data-paths and their corresponding filter structure is proposed for performing a two-dimensional discrete wavelet transform (2D DWT). Compared with the conventional 2D DWT VLSI architectures, the proposed semi-recursive 2D DWT VLSI architecture has minimum hardware cost, and optimised data-bus utilisation, scheduling control overhead and storage size 相似文献

8.

High-speed VLSI architectures for soft-output viterbi decoding

Olaf J. Joeressen Martin Vaupel Heinrich Meyr 《The Journal of VLSI Signal Processing》1994,8(2):169-181

During the last years decoding algorithms that make not only use of soft quantized inputs but also deliver soft decision outputs have attracted considerable attention because additional coding gains are obtainable in concatenated systems. A prominent member of this class of algorithms is the Soft-Output Viterbi Algorithm. In this paper two architectures for high speed VLSI implementations of the Soft-Output Viterbi-Algorithm are proposed and area estimates are given for both architectures. The well known trade-off between computational complexity and storage requirements is played to obtain new VLSI architectures with increased implementation efficiency. Area savings of up to 40% in comparison to straightforward solutions are reported.This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under contract Me 651/12-1. 相似文献

9.

A tree-matching chip

Krishna V. Ranganathan N. Ejnioui A. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(2):277-280

Tree matching is an important problem used for three-dimensional object recognition in image understanding and vision systems. The objective of tree matching is to find the set of nodes at which a pattern tree matches a subject tree. In this paper, we describe the design and implementation of a very large scale integration (VLSI) chip for tree pattern matching. The architecture is based on an iterative algorithm that is mapped to a systolic array computational model and takes O(t(n+a)) time to profess a subject of size n using a processors where a is the length of the largest substring in the pattern and t is the number of substrings in the pattern. The variables and nonvariables of the pattern tree are processed separately, which simplifies the hardware in each processing element. The proposed partitioning strategy is independent of the problem size and allows larger strings to be processed based on the array size. A prototype CMOS VLSI chip has been designed using the Cadence design tools and the simulation results indicate that it will operate at 33.3 MHz 相似文献

10.

Architectures for finite radon transform

Rahman C.A. Badawy W. 《Electronics letters》2004,40(15):931-932

Two VLSI architectures for the finite Radon transform are presented. The first is a reference architecture using memory blocks and the second is a memoryless architecture. The proposed architectures use 7/spl times/7 size image blocks and are prototyped for processing the CIF image sequence. The simulation and synthesis results show that the core speeds of the two proposed architectures are around 100 and 82 MHz, respectively. 相似文献

11.

VLSI Architectures for the Finite Impulse Response Filter 总被引：1，自引：0，他引：1

Kam Cheng Sahni S. 《Selected Areas in Communications, IEEE Journal on》1986,4(1):92-99

We review the various VLSI architectures that have been proposed for the finite impulse response filter problem. In addition, new architectures are proposed and improved designs for some of the earlier architectures are developed. 相似文献

12.

VLSI computing architectures for Haar transform

Ray Liu K.J. 《Electronics letters》1990,26(23):1962-1963

The Haar transform is very useful in many signal and image processing applications where real-time implementation is essential. Three VLSI computing architectures are proposed for fast implementation of the Haar transform. Comparisons on the advantages and disadvantages of the proposed architectures are also presented.<> 相似文献

13.

A reconfigurable VLSI coprocessing system for the block matchingalgorithm

Bugeja A. Yang W. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1997,5(3):329-337

Several VLSI architectures for the full-search block matching algorithm have been proposed in recent years due to its computation and I/O-intensive nature and its importance in various computer vision and image processing applications. This paper presents a new coarse grained reconfigurable coprocessor which is suitable for integration with general purpose microprocessors. The 180000 transistor custom VLSI design was implemented in 0.6 μm CMOS on a 4.12 min×2.59 mm die and has been fully tested up to 33 MHz. For a typical image database search application, a sample system consisting of four coprocessors interfaced through a 33 MHz PCI bus will provide a speedup of 320× over an 80486 DX2/66 MHz and 64× over a 150-MHz Pentium running fully optimized assembly code 相似文献

14.

Cost Effective VLSI Architectures for Full-Search Block-Matching Motion Estimation Algorithm

Zhong L. He Ming L. Liou 《The Journal of VLSI Signal Processing》1997,17(2-3):225-240

In this paper, we present efficient VLSI architectures for full-search block-matching motion estimation (BMME) algorithm. Given a search range, we partition it into sub-search arrays called tiles. By fully exploiting data dependency within a tile, efficient VLSI architectures can be obtained. Using the proposed VLSI architectures, all the block-matchings in a tile can be processed in parallel. All the tiles within a search range can be processed serially or concurrently depending on various requirements. With the consideration of processing speed, hardware cost, and I/O bandwidth, the optimal tile size for a specific video application is analyzed. By partitioning a search range into tiles with appropriate size, flexible VLSI designs with different throughput can be obtained. In this way, cost effective VLSI designs for a wide range of video applications, from H.261 to HDTV, can be achieved. 相似文献

15.

A Systematic Approach for Synthesizing VLSI Architectures of Lifting-Based Filter Banks and Transforms

《IEEE transactions on circuits and systems. I, Regular papers》2008,55(7):1939-1952

The lifting scheme has become an important tool for designing filter banks and transforms of digital signal processing. Recently, the conventional lifting scheme that concerns the construction of 2-channel filter banks has been extended to $M$-channel filter banks $(M>2)$, bringing up the desirable properties of the lifting scheme to a broader range of applications. Many hand-crafted lifting-based VLSI architectures exist, which mostly concentrate on a single and specific target application having fixed data throughput and resource consumption. However, the reusability of such architectures is limited due to the lack of scalability. To overcome this issue, we present a design methodology for automatic synthesis of VLSI architectures suitable for arbitrary lifting-based $M$-channel filter banks and transforms. The proposed methodology enables high parameterizability in terms of data throughput, resource consumption, and arithmetic precision for the generated architectures. The concept of parameterizing design elements is important for modern system-on-chip design, since it features design space exploration and increases reusability. The proposed methodology is implemented as a high-level compilation tool that generates VLSI architectures at the register transfer level. We present results on the implementation of different architectures that were generated by our tool. 相似文献

16.

一种用于块运动估计的匹配准则函数块特征匹配函数

骆立俊邹采荣何振亚《电子与信息学报》1998,20(4):486-491

本文提出了一种新的块运动估计匹配准则函数块特征匹配(BFM)函数,可以用于视频压缩的一些国际标准,如H.261,H.263,MPEG1,MPEG2,HDTV的编解码器中。在这些视频压缩国际标准中视频系统编码器的复杂性最主要取决于运动估计算法。实时的块匹配运动估计的VLSI实现需要考虑以下几个方面:在给定搜索域内运动搜索的复杂度;每次块匹配运算的匹配计算复杂度;每次块匹配运算需要从帧存读取到运动估计处理器的数据量大小;实时硬件实现的适用性.仿真表明BFM算法非常简单有效,可以大大降低相应的块匹配计算复杂度、匹配运算时数据传输时间.BFM函数便于并行实现,从而可以有效地缩短视频编码器的编码时间。本文还详细地给出了BFM函数与其它常用匹配准则函数的比较结果. 相似文献

17.

Efficient VLSI architectures for Columnsort

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1999,7(1):135-138

This paper presents novel very large scale integration (VLSI) architectures in support of an efficient implementation of Leighton's well-known Columnsort. The designs take advantage of reconfigurable bus architectures enhanced with simple shift switches. Our first main contribution is to show that Columnsort can be partitioned into two components: a hardware scheme involving the task of sorting arrays of small size and a hardware or software scheme that involves simple data movement tasks. Our second main contribution is to demonstrate that the dynamically reconfigurable mesh architecture can be exploited to obtain a small and efficient hardware sorter. The resulting architectures feature high regularity of circuitry, simplicity of control structure, and adaptability. Both theoretical analyses and simulation tests have shown that the proposed VLSI architectures for sorting are superior to existing designs in the context of sorting small and moderate size arrays 相似文献

18.

Simultaneous multiple target recognition using polarization agile waves

Xinwei Chen Jianzhong Zhao Wen Wu 《电子科学学刊(英文版)》2012,29(3-4):237-241

A novel matching method for simultaneous multi-target recognition is proposed by jointly considering target’s prior scattering knowledge and the polarization parameters of radar echoes. The matching coefficients are calculated for the judgment. MATLAB simulations show that several targets can be accurately recognized simultaneously, and a high recognition probability can be achieved in Monte Carlo simulations. The total execution time can be remarkably reduced in the Field Programmable Gate Array (FPGA) implementation of the matching procedure. 相似文献

19.

SIMULTANEOUS MULTIPLE TARGET RECOGNITION USING POLARIZATION AGILE WAVES

Chen Xinwei Zhao Jianzhong Wu Wen Ministerial 《电子科学学刊(英文版)》2012,(Z2):237-241

A novel matching method for simultaneous multi-target recognition is proposed by jointly considering target’s prior scattering knowledge and the polarization parameters of radar echoes. The matching coefficients are calculated for the judgment. MATLAB simulations show that several targets can be accurately recognized simultaneously, and a high recognition probability can be achieved in Monte Carlo simulations. The total execution time can be remarkably reduced in the Field Programmable Gate Array (FPGA) implementation of the matching procedure. 相似文献

20.

Efficient realizations of the discrete and continuous wavelettransforms: from single chip implementations to mappings on SIMD arraycomputers

Chakrabarti C. Vishwanath M. 《Signal Processing, IEEE Transactions on》1995,43(3):759-771

This paper presents a wide range of algorithms and architectures for computing the 1D and 2D discrete wavelet transform (DWT) and the 1D and 2D continuous wavelet transform (CWT). The algorithms and architectures presented are independent of the size and nature of the wavelet function. New on-line algorithms are proposed for the DWT and the CWT that require significantly small storage. The proposed systolic array and the parallel filter architectures implement these on-line algorithms and are optimal both with respect to area and time (under the word-serial model). Moreover, these architectures are very regular and support single chip implementations in VLSI. The proposed SIMD architectures implement the existing pyramid and a'trous algorithms and are optimal with respect to time 相似文献