首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The HP-PA8000 is a 180-MHz quad-issue custom VLSI implementation of the HP-PA 2.0 64-b architecture delivering 11.84 SPECint95 and 20.18 SPECfp95 with 3.8 million transistors integrated on a 17.68 mm×19.1 mm die in a 3.3-V, 0.5-μm CMOS process. Specialized clock circuits and extensive use of dynamic logic are key factors in this microprocessor's performance. Attention to clock analysis and distribution resulted in a 170 ps clock skew between any two clock nodes. This microprocessor utilizes a 56-entry instruction reorder buffer (IRE), register renaming, and dual functional units to fully exploit instruction level parallelism  相似文献   

2.
A quad-issue custom VLSI microprocessor is described. This microprocessor implements the Alpha architecture and achieves an estimated performance of 13.3 SPECint9S and 18.4 SPECfp95 at 433 MHz. The 9.6 million transistor die measures 14.4 mm×14.5 mm, and is fabricated in a 0.35-μm, four-metal layer CMOS process. This chip dissipates less than 25 W at 433 MHz using a 2.0 V internal power supply. The design was leveraged from a prior 300-MHz, 3.3-V, 0.50-μm CMOS design. It includes several significant architectural enhancements and required circuit solutions for operation at 2.0 V. The chip will operate at nominal internal power supply voltages up to 2.5 V allowing improved performance at the cost of increased power consumption. At 2.5 V, the chip operates at 500 MHz and delivers 15.4 SPECint95 (est) and 21.1 SPECfp95 (est). This paper describes the chip implementation details and the strategy for efficiently migrating the existing design to the 0.35-μm technology  相似文献   

3.
A 300-MHz 64-b quad-issue CMOS RISC microprocessor   总被引:1,自引:0,他引:1  
This 300 MHz quad-issue custom VLSI implementation of the Alpha architecture delivers 1200 MIPS (peak), 600 MFLOPS (peak), 341 SPECint92, and 512 SPECfp92. The 16.5 mm×18.1 mm die contains 9.3 M transistors and dissipates 50 W at 300 MHz. It is fabricated in a 3.3 V, four-layer metal, 0.5 μm, CMOS process. The upper metal layers (metal-3 and metal-4), primarily used for power, ground, and clock distribution. The chip supports 3.3 V/5.0 V interfaces and is packaged in a 499-pin ceramic IPGA. It contains an 8-kbyte instruction cache; an 8-kbyte, dual-ported, data cache; and a 96-kbyte, unified, second-level, 3-way set associative, fully pipelined, writeback cache. This paper describes the circuit and implementation techniques that were used to attain the 300 MHz operating frequency  相似文献   

4.
An 80-MFLOPS (peak) 64-b microprocessor that employs superscalar architecture to execute two instructions simultaneously in one 25-ns cycle, including the combination of 64-b floating-point add and multiply instructions, is described. The processor implemented in a 0.8-μm CMOS technology contains 1300 K transistors. The processor also employs a RISC architecture and Harvard-style bus organization. The authors provide an overview of the processor, especially focusing on processor architecture, floating-point hardware, and performance  相似文献   

5.
A 167 MHz 64 b VLSI CPU chip is described. The chip executes a 333-MFLOPS (peak) with an estimated system performance of 270SPECint92/380SPECfp92 (@167 MHz, 2 MB E-cache). The 17.7×17.8 mm die is fabricated with a 0.5 micron CMOS technology with four metal layers and contains 5.2 M transistors. The superscalar processor is capable of sustaining an execution rate of four instructions per cycle even in the presence of conditional branches and cache misses. Four fully pipelined 8×16 b multipliers and four single-cycle latency 16 b adders combine to speed up image processing, 2-D, 3-D graphics, video compression/decompression by up to an order of magnitude. High clock speed was obtained by the use of delayed reset logic, a new register file design; and novel comparators. Strict design methodology allowed fully functional first silicon which met all speed targets. The power dissipation of the chip is 28 W  相似文献   

6.
A 32-b RISC/DSP microprocessor with reduced complexity   总被引:2,自引:0,他引:2  
This paper presents a new 32-b reduced instruction set computer/digital signal processor (RISC/DSP) architecture which can be used as a general purpose microprocessor and in parallel as a 16-/32-b fixed-point DSP. This has been achieved by using RISC design principles for the implementation of DSP functionality. A DSP unit operates in parallel to an arithmetic logic unit (ALU)/barrelshifter on the same register set. This architecture provides the fast loop processing, high data throughput, and deterministic program flow absolutely necessary in DSP applications. Besides offering a basis for general purpose and DSP processing, the RISC philosophy offers a higher degree of flexibility for the implementation of DSP algorithms and achieves higher clock frequencies compared to conventional DSP architectures. The integrated DSP unit provides instruction set support for highly specialized DSP algorithms. Subword processing optimized for DSP algorithms has been implemented to provide maximum performance for 16-b data types. While creating a unified base for both application areas, we also minimized transistor count and we reduced complexity by using a short instruction pipeline. A parallelism concept based on a varying number of instruction latency cycles made superscalar instruction execution superfluous  相似文献   

7.
This quad-issue processor achieves 1-GHz operation through improved dynamic circuit techniques in critical paths and a more extensive on-chip memory system which scales in both bandwidth and latency. Critical logic paths use domino, delayed clocked domino, and logic embedded in dynamic flip-flops for minimum delay. A 64-KB sum-addressed memory data cache combines the address offset add with the cache decode, allowing the average memory latency to scale by more than the clock ratio. Memory bandwidth is improved by using wave pipelined SRAM designs for on-chip caches and a write cache for store traffic. Memory power is controlled without increased latency by use of delayed-reset logic decoders. The chip operates at 1000 MHz and dissipates less than 80 W from a 1.6-V supply. It contains 23 million transistors (12 million in RAM cells) on a 244 mm2 die  相似文献   

8.
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU is described. The chip is fabricated in a 0.75-μm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. The die size is 16.8 mm×13.9 mm and contains 1.68 M transistors. The chip includes separate 8-kbyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. It is designed to execute two instructions per cycle among scoreboarded integer, floating-point, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation  相似文献   

9.
This paper describes a 160 MHz 500 mW 32 b StrongARM(R) microprocessor designed for low-power, low-cost applications. The chip implements the ARM(R) V4 instruction set and is bus compatible with earlier implementations. The pin interface runs at 3.3 V but the internal power supplies can vary from 1.5 to 2.2 V, providing various options to balance performance and power dissipation. At 160 MHz internal clock speed with a nominal Vdd of 1.65 V, it delivers 185 Dhrystone 2.1 MIPS while dissipating less than 450 mW. The range of operating points runs from 100 MHz at 1.65 V dissipating less than 300 mW to 200 MHz at 2.0 V for less than 900 mW. An on-chip PLL provides the internal clock based on a 3.68 MHz clock input. The chip contains 2.5 million transistors, 90% of which are in the two 16 kB caches. It is fabricated in a 0.35-μm three-metal CMOS process with 0.35 V thresholds and 0.25 μm effective channel lengths. The chip measures 7.8 mm×6.4 mm and is packaged in a 144-pin plastic thin quad flat pack (TQFP) package  相似文献   

10.
This 533-MHz BiCMOS very large scale integration (VLSI) implementation of the PowerPC architecture contains three pipelines and a large on-chip secondary cache to achieve a peak performance of 1600 MIPS. The 15 mm×10 mm die contains 2.7 M transistors (2M CMOS and 0.7 M bipolar) and dissipates less than 85 W. The die is fabricated in a six-level metal, 0.5-μm BiCMOS process and requires 3.6 and 2.1 V power supplies  相似文献   

11.
The authors describe circuit techniques for wide input/output (I/O) data path and high-speed 64-Mb dynamic RAMs (DRAMs). A hierarchical data bus structure using double-level metallization has been developed to form 64-b parallel data bus lines without increasing the chip size. A current-sensing data bus amplifier, developed to sense the 64-b data bus signal in parallel, has made the wide I/O data path structure possible. A direct-sensing type column gate circuit with the READ/WRITE separated select line scheme achieves 40-ns RAS access. A shielded bit-line three-dimensional stacked-capacitor cell with a double-fin storage capacitor stores sufficient charge while the bit-line capacitance shows a reasonable value for sensing the data  相似文献   

12.
A 32-b single-chip VLSI CPU which implements the entire 140 instructions of the Hewlett-Packard precision architecture (HPPA) using direct hardwired decoding and execution is described. A sustained pipeline performance of 10.8 million instructions per second (MIPS), 15-MIPS peak, is achieved. The chip is fabricated in a 1.5-/spl mu/m NMOS production process which utilizes two levels of tungsten interconnect and contains 115000 transistors on an 8.4/spl times/8.4-mm die. A 30-MHz operating frequency is achieved under worst-case operating conditions.  相似文献   

13.
A 28 mW/MHz at 80 MHz structured-custom RISC microprocessor design is described. This 32-b implementation of the PowerPC architecture is fabricated in a 3.3 V, 0.5 μm, 4-level metal CMOS technology, resulting in 1.6 million transistors in a 7.4 mm by 11.5 mm chip size. Dual 8-kilobyte instruction and data caches coupled to a high performance 32/64-b system bus and separate execution units (float, integer, loadstore, and system units) result in peak instruction rates of three instructions per clock cycle. Low-power design techniques are used throughout the entire design, including dynamically powered down execution units. Typical power dissipation is kept under 2.2 W at 80 MHz. Three distinct levels of software-programmable, static, low-power operation-for system power management are offered, resulting in standby power dissipation from 2 mW to 350 mW. CPU to bus clock ratios of 1×, 2×, 3×, and 4× are implemented to allow control of system power while maintaining processor performance. As a result, workstation level performance is packed into a low-power, low-cost design ideal for notebooks and desktop computers  相似文献   

14.
64位RISC微处理器的结构设计   总被引:1,自引:0,他引:1  
文章介绍了一种64位RISC微处理器的结构设计。采用MIPS指令集,详细分析该处理器的各主要功能单元.五级流水线控制,并对该设计中潜在流水线冒险问题提供完整解决方案,最后通过在线仿真调试及配置FPGA验证了设计的正确性。  相似文献   

15.
A 1.3-GHz fifth-generation SPARC64 microprocessor   总被引:1,自引:0,他引:1  
A fifth-generation SPARC64 processor is fabricated in 130-nm partially depleted silicon-on-insulator CMOS with eight layers of Cu metallization. At V/sub dd/ = 1.2 V and T/sub a/ = 25/spl deg/C, it runs at 1.3 GHz and dissipates 34.7 W. The chip contains 191 M transistors with 19 M logic circuits in an area of 18.14 mm /spl times/ 15.99 mm and is covered with 5858 bumps, of which 269 are for I/O signals. It is mounted in a 1360-pin land-grid-array package. The 16-byte-wide system bus operates with a 260-MHz clock in single-data-rate or double-data-rate modes. This processor implements an error-detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors detected in execution units and data paths are recovered via instruction retry. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. Tunability of the clock timing makes timing closure easier. A relatively small amount of custom circuit design and the use of mostly static circuits contributes to achieve short development time.  相似文献   

16.
This superscalar microprocessor is the first implementation of a 32-bit RISC architecture specification incorporating a single-instruction, multiple-data vector processing engine. Two instructions per cycle plus a branch can be dispatched to two of seven execution units in this microarchitecture designed for high execution performance, high memory bandwidth, and low power for desktop, embedded, and multiprocessing systems. The processor features an enhanced memory subsystem, 128-bit internal data buses for improved bandwidth, and 32-KB eight-way instruction/data caches. The integrated L2 tag and cache controller with a dedicated L2 bus interface supports L2 cache sizes of 512 KB, 1 MB, or 2 MB with two-way set associativity. At 450 MHz, and with a 2-MB L2 cache, this processor is estimated to have a floating-point and integer performance metric of 20 while dissipating only 7 W at 1.8 V. The 10.5 million transistor, 83-mm2 die is fabricated in a 1.8-V, 0.20-μm CMOS process with six layers of copper interconnect  相似文献   

17.
The first single-chip 64-b vector-pipelined processor (VPP) ULSI is described. It executes vector operations indispensable to high-speed scientific computation. The VPP ULSI attains a 200-MFLOPS peak performance at a 100-MHz clock frequency. This extremely high performance is made possible by the integration on the VPP of a 64-b five-stage pipelined adder/shifter, a 64-b five-stage pipelined multiplier/divider/logic operation unit, and a 40-kb register file. Various new high-speed circuit techniques have been also developed for 100-MHz operations. The chip, which was fabricated with a 0.8-μm BiCMOS and triple-layer metallization process technology, has a 17.2-mm×17.3-mm area and contains about 693 K transistors. It consumes 13.2 W at a 100-MHz clock frequency with a single 5-V power supply  相似文献   

18.
A 550-MHz 64-b PowerPC processor in 0.2-um silicon-on-insulator (SOI) copper technology achieves a 22% frequency gain over a similar design in a CMOS bulk technology. Performance gains are 15%-40% at the circuit level, 24%-28%, for critical paths. Unique SOI design aspects such as history effect, lowered noise margins, parasitic bipolar current, and self-heating are considered  相似文献   

19.
A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0-μm single-poly bipolar technology. This research prototype contains a CPU and on-chip 2-KB instruction and 2-KB data caches. Worst-case power dissipation with a nominal -5.2 V supply is 115 W. The chip has been designed for a worst-case clock frequency of 275 MHz at a nominal supply. The chip verifies a new style of CAD tools developed during the design process, advanced packaging techniques for high-power microprocessors, and VLSI ECL circuit techniques  相似文献   

20.
A dual-core 64-bit ultraSPARC microprocessor for dense server applications   总被引:1,自引:0,他引:1  
A dual-core 64-bit microprocessor optimized for compute-dense systems such as rack-mount and blade servers for network computing was developed. The chip consists of two UltraSPARC II cores, each with its own 512 kB L2 cache, a DDR-1 memory controller, and symmetric multiprocessor bus (JBus) controllers. The 206-mm/sup 2/ die is fabricated in 0.13-/spl mu/m CMOS technology with seven layers of Cu and a low-k dielectric. The chip offers a highly efficient performance-per-watt ratio with a typical power dissipation of 23 W at 1.3 V and 1.2 GHz. A short design cycle was achieved by leveraging existing designs wherever possible and developing effective design methodologies and flows. Significant design challenges faced by this project are described. These include deep-submicron design issues, such as negative bias temperature instability (NBTI), leakage, coupling noise, intra-die process variation, and electromigration (EM). A second important design challenge was implementing a high-performance L2 cache subsystem with a short four-cycle core-to-L2 latency including ECC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号