首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Novel algorithmic features of multimedia applications and advances in VLSI technologies are driving forces behind the new multimedia signal processors. We propose an architecture platform which could provide high performance and flexibility, and would require less external I/O and memory access. It is comprised of array processors to be used as the hardware accelerator and RISC cores to be used as the basis of the programmable processor. It is a hierarchical and scalable architecture style which facilitates the hardware-software codesign of multimedia signal processing circuits and systems. While some control-intensive functions can be implemented using programmable CPUs, other computation-intensive functions can rely on hardware accelerators.To compile multimedia algorithms, we also present an operation placement and scheduling scheme suitable for the proposed architectural platform. Our scheme addresses data reusability and exploits local communication in order to avoid the memory/communication bandwidth bottleneck, which leads to faster program execution. Our method shows a promising performance: a linear speed-up of 16 times can be achieved for the block-matching motion estimation algorithm and the true motion tracking algorithm, which have formed many multimedia applications (e.g., MPEG-2 and MPEG-4).  相似文献   

2.
The increasing use of embedded software, often implemented on a core processor in a single-chip system, is a clear trend in the telecommunications, multimedia, and consumer electronics industries. A companion paper (Paulin et al., 1997) presents a survey of application and architecture trends for embedded systems in these growth markets. However, the lack of suitable design technology remains a significant obstacle in the development of such systems. One of the key requirements is more efficient software compilation technology. Especially in the case of fixed-point digital signal processor (DSP) cores, it is often cited that commercially available compilers are unable to take full advantage of the architectural features of the processor. Moreover, due to the shorter lifetimes and the architectural specialization of many processor cores, processor designers are often compelled to neglect the issue of compiler support. This situation has resulted in an increased research activity in the area of design tool support for embedded processors. This paper discusses design technology issues for embedded systems using processor cores, with a focus on software compilation tools. Architectural characteristics of contemporary processor cores are reviewed and tool requirements are formulated. This is followed by a comprehensive survey of both existing and new software compilation techniques that are considered important in the context of embedded processors  相似文献   

3.
4.
The programmable video signal processor (VSP) is an important category of processors for multimedia systems. Programmable video processors combine the flexibility of programmability with special architectural features that improve performance on video processing applications. VSPs are typically multiple processors with several processing elements (PEs) and a parallel memory system. This paper focuses on the architectural design of the PE's in a video processor and shows how technology and circuit parameters influence the structure of the datapath and, hence, the overall architecture of a programmable VSP. We emphasize the need to consider technological and circuit-level issues during the design of a system architecture and present a method whereby the conceptual organization of the PEs-the number of PEs, pipelining of the datapath, size of the register file, and number of register ports-can be evaluated in terms of a target set of applications before a detailed design is undertaken. We use motion-estimation and discrete cosine transform as example applications to illustrate how various technology parameters affect the architectural design choices. We show that the design of the register file and the datapath-pipeline depth can drastically affect PE utilization and, therefore, the number of PEs required for different applications. Our results demonstrate that pursuing the fastest cycle time can greatly increase the silicon area which must be devoted to PEs, due to both increased pipeline latency and reduced register file bandwidth  相似文献   

5.
Three-dimensional chip (3-D) stacking technology provides a new approach to address the so-called memory wall problem. Memory processor chip stacking reduces this memory wall problem, permitting faster clock rates (with suitable processor logic) or permitting multicore access to shared memory using a large number of vertical vias between tiers in the stack, for ultrawide bit path transfer of data and address information to and from various levels of cache. Although a limited amount of parallel access is possible using conventional two-dimensional (2-D) chip memory-processor approaches, 3-D memory-processor stacking greatly extends this to much larger capacity memories. We evaluate high-clock-rate processors as well as shared memory processors with a large number of cores. Various architectural design options to reduce the impact of the memory wall on the processor performance are explored and validated through simulations. Certain architectural features can be implemented in a 3-D chip, such as an ultrawide, ultrashort vertical bus with low parasitic resistance and the elimination of conventional electrostatic discharge, and packaging parasitics required in multiple package 2-D solutions. The objective is to reduce the clocks per instruction figure of merit for high clock speeds in order to deliver significant performance levels. High-clock-rate processors can be designed with SiGe heterostructure bipolar transistors to obtain processors operating on the order of 16 or 32 GHz.   相似文献   

6.
Low power and high performance are the two most important criteria for many signal-processing system designs, particularly in real-time multimedia applications. There have been many approaches to achieve these two design goals at many different implementation levels ranging from very-large-scale-integration fabrication technology to system design. We review the works that have been done at various levels and focus on the algorithm-based approaches for low-power and high-performance design of signal processing systems. We present the concept of multirate computing that originates from filterbank design, then show how to employ it along with the other algorithmic methods to develop low-power and high-performance signal processing systems. The proposed multirate design methodology is systematic and applicable to many problems. We demonstrate that multirate computing is a powerful tool at the algorithmic level that enables designers to achieve either significant power reduction or high throughput depending on their choice. Design examples on basic multimedia processing blocks such as filtering, source coding, and channel coding are given. A digital signal-processing engine that is an adaptive reconfigurable architecture is also derived from the common features of our approach. Such an architecture forms a new generation of high-performance embedded signal processor based on the adaptive computing model. The goal of this paper is to demonstrate the flexibility and effectiveness of algorithm-based approaches and to show that the multirate approach is an effective and systematic design methodology to achieve low-power and high throughput signal processing at the algorithmic and architectural level  相似文献   

7.
Multimedia processors   总被引:5,自引:0,他引:5  
This paper describes large-scale-integration programmable processors designed for multimedia processing such as real-time compression and decompression of audio and video as well as the generation of computer graphics. As the target of these processors is to handle audio and video in real time, the processing capability must be increased tenfold compared to that of conventional microprocessors, which were designed to handle mainly texts, figures, tables, and photographs. To clarify the advantages of a high-speed multimedia processing capability, we define these chips as multimedia processors. General-purpose microprocessors for workstations and personal computers (PCs) use special built-in hardware for multimedia processing, so the multimedia processors described include these modified general-purpose microprocessors. After reviewing the history of programmable processors, we classify multimedia processors into five categories depending on their basic architecture. The categories are reduced instruction set computer (RISC) microprocessors for workstations, complex instruction set computer microprocessors for PCs, embedded RISCs, low-power digital signal processors (DSPs), which are mainly used for mobile communications devices, and media processors that support PCs for multimedia applications. These five classes are then grouped into two: microprocessors with a multimedia instruction set and highly parallel DSPs. An architectural comparison between these two groups on the basis of Moving Picture Experts Group decoding applications is made, and the advantages and disadvantages of each class are clarified. Future processors, including “system on a chip,” and their applications are also discussed  相似文献   

8.
VLSI architectures for video compression-a survey   总被引:3,自引:0,他引:3  
The paper presents an overview on architectures for VLSI implementations of video compression schemes as specified by standardization committees of the ITU and ISO. VLSI implementation strategies are discussed and split into function specific and programmable architectures. As examples for the function oriented approach, alternative architectures for DCT and block matching will be evaluated. Also dedicated decoder chips are included Programmable video signal processors are classified and specified as homogeneous and heterogenous processor architectures. Architectures are presented for reported design examples from the literature. Heterogenous processors outperform homogeneous processors because of adaptation to the requirements of special, subtasks by dedicated modules. The majority of heterogenous processors incorporate dedicated modules for high performance subtasks of high regularity as DCT and block matching. By normalization to a fictive 1.0 μm CMOS process typical linear relationships between silicon area and through-put rate have been determined for the different architectural styles. This relationship indicates a figure of merit for silicon efficiency  相似文献   

9.
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors   总被引:2,自引:0,他引:2  
Very Long Instruction Word (VLIW) processor architectures for multimedia applications are discussed from an algorithm, hardware and system based point of view. VLIW processors show high flexibility and processing power, as well as a good utilization of resources by compiler-generated code, but their exclusive exploitation of instruction level parallelism (ILP) decreases in efficiency as the degree of parallelism increases. This is mainly caused by characteristics of multimedia algorithms, increasing wiring delays, compiler restrictions, and a widening gap between on-chip processing speed and available bandwidth to external memory. As new multimedia applications and standards continue to evolve (MPEG-4), the demand for higher processing power will continue. Therefore, parallel processing in all its available forms will have to be exploited to achieve significant performance improvements. We show that, due to the diminishing returns from a further increase in ILP, multimedia applications will benefit more from an additional exploitation of parallelism at thread-level. We examine how simultaneous multithreading (SMT), a novel architectural approach combining VLIW techniques with parallel processing of threads, can efficiently be used to further increase performance of typical multimedia workloads.  相似文献   

10.
多核DSP编程技术研究   总被引:1,自引:0,他引:1  
数字信号处理器(DSP)是对数字信号进行高速实时处理的专用处理器。当前,基于单核结构的嵌入式处理器越来越不能满足日益增长的数据处理应用方面的要求,单纯的增加单个处理器的处理速度更会带来难以接受的能耗。多核嵌入式结构已成为解决这一问题的有效途径,也使整个系统只用DSP搭建成为可能,但同时也为如何开发充分利用多核结构的应用...  相似文献   

11.
The digital signal processor Derby   总被引:8,自引:0,他引:8  
《Spectrum, IEEE》2001,38(6):62-68
Applications that use digital signal processing chips are flourishing, buoyed by increasing performance and falling prices. Concurrently, the market has expanded enormously. Vendors abound. Many newcomers have entered the market, while established companies compete for market share by creating ever more novel, efficient, and higher-performing architectures. The range of digital signal-processing (DSP) architectures available is unprecedented. In addition to expanding competition among DSP processor vendors, a new threat is coming from general-purpose processors with DSP enhancements. So, DSP vendors have begun to adapt their architectures to stave off the outsiders. The author provides a framework for understanding the recent developments in DSP processor architectures, including the increasing interchange of architectural techniques between DSPs and general-purpose processors  相似文献   

12.
13.
Aggressive processor design methodology using high-speed clock and deep submicrometer technology is necessitating the use of at-speed delay fault testing. Although nearly all modern processors use pipelined architecture, no method has been proposed in literature to model these for the purpose of test generation. This paper proposes a graph theoretic model of pipelined processors and develops a systematic approach to path delay fault testing of such processor cores using the processor instruction set. The proposed methodology generates test vectors under the extracted architectural constraints. These test vectors can be applied in functional mode of operation, hence, self-test becomes possible. Self-test in a functional mode can also be used for online periodic testing. Our approach uses a graph model for architectural constraint extraction and path classification. Test vectors are generated using constrained automatic test pattern generation (ATPG) under the extracted constraints. Finally, a test program consisting of an instruction sequence is generated for the application of generated test vectors. We applied our method to two example processors, namely a 16-bit 5-stage VPRO pipelined processor and a 32-bit pipelined DLX processor, to demonstrate the effectiveness of our methodology  相似文献   

14.
15.
This paper describes the architecture, functionality, and design of NX-2700, a digital television and media processor chip from Philips Semiconductors. The NX-2700 is the second generation of an architectural family of programmable multimedia processors targeted at the digital television (DTV) markets, including the United States Advanced Television Systems Committee (ATSC) DTV-standard-based applications. The chip not only supports all of the 18 ATSC formats from standard-definition to wide-angle, high-definition video, but has also the power to handle high-definition television (HDTV) video and audio source decoding (high-level MPEG-5 AC-3 and ProLogic audio, closed captioning, etc.) as well as the flexibility to process advanced interactive services. NX-2700 is a programmable processor with a very powerful, general-purpose very long instruction word (VLIW) central processing unit (CPU) core that implements many nontrivial multimedia algorithms, coordinates all on-chip activities, and runs a small real-time operating system. The CPU core, aided by an array of peripheral devices (multimedia coprocessors and input-output units) and high-performance buses, facilitates concurrent processing of audio, video, graphics, and communication-data  相似文献   

16.
Telecommunications controllers worldwide are becoming increasingly stored-program oriented. Electromechanical and electronic wired logic controllers lack the flexibility, maintainability, and (ultimately) the economy of stored program processors. While similar to commercial computers, most stored program telecommunications processors have features unique either in nature or degree of application. There also exists a remarkable variety of overall architectural features which are useful in telecommunications applications, and a similar variety of administrations and manufacturers willing to espouse them. Pertinent stored program processor architectural features are discussed as a dass, then extant and proposed processors are described and placed in perspective, Finally, likely future trends are discussed.  相似文献   

17.
In this paper, we propose a novel reconfigurable processor using dynamically partitioned single‐instruction multiple‐data (DP‐SIMD) which is able to process multimedia data. The SIMD processor and parallel SIMD (P‐SIMD) processor, which is composed of a number of SIMD processors, are usually used these days. But these processors are inefficient because all processing units (PUs) should process the same operations all the time. Moreover, the PUs can process different operations only when every SIMD group operation is predefined. We propose a processor control method which can partition parallel processors into multiple SIMD‐based processors dynamically to enhance efficiency. For performance evaluation of the proposed method, we carried out the inverse transform, inverse quantization, and motion compensation operations of H.264 using processors based on SIMD, P‐SIMD, and DP‐SIMD. Experimental results show that the DP‐SIMD control method is more efficient than SIMD and P‐SIMD control methods by about 15% and 14%, respectively.  相似文献   

18.
The Micro-Grain Array Processor-2 (MGAP-2) is a two-dimensional SIMD array of 49152 fine-grain processors designed primarily for high-performance signal and image processing. Each processor can compute two arbitrary three-input Boolean functions, contains local RAM, and has additional logic for interprocessor communication. The MGAP-2 differs from existing fine-grain arrays in that it has a high degree of integration while incorporating processor level interconnect control. Each processor can independently select its communication direction. This allows a programmer to map algorithms onto the array in a more efficient manner than if the processors communicated in the standard SIMD fashion. Also, the MGAP-2's processor level interconnect allows groups of processors to be clustered into larger computational units, making the basic computational units as powerful as they need to be for a given problem  相似文献   

19.
This paper presents systematic techniques to find low-power high-performance superscalar processors tailored to specific user applications. The model of power is novel because it separates power into architectural and technology components. The architectural component is found via trace-driven simulation, which also produces performance estimates. An example technology model is presented that estimates the technology component, along with critical delay time and real estate usage. This model is based on case studies of actual designs. It is used to solve an important problem: decreasing power consumption in a superscalar processor without greatly impacting performance. Results are presented from runs using simulated annealing to reduce power consumption subject to performance reduction bounds. The major contributions of this paper are the separation of architectural and technology components of dynamic power the use of trace-driven simulation for architectural power measurement, and the use of a near-optimal search to tailor a processor design to a benchmark  相似文献   

20.
Lapsley  P. Blalock  G. 《Spectrum, IEEE》1996,33(7):74-78
The market for products based on digital signal processing (DSP) technology-wireless communication devices and PC multimedia peripherals, for example-is growing rapidly. Semiconductor manufacturers have responded to this demand with a bewildering array of DSP processors. Selecting the best one for a given application presents a difficult and time-consuming challenge for DSP system designers. Simple, familiar performance measures like MIPS and MOPS are misleading and neglect factors like memory usage, power consumption and application execution time. Complex alternatives-such as application benchmarks-suffer from limitations that virtually preclude fair comparisons. Fortunately, a compromise methodology that combines algorithm kernel benchmarking with application profiling yields good estimates of processor performance weighted to the target application  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号