期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A 1.5-GHz 130-nm Itanium/sup /spl reg// 2 Processor with 6-MB on-die L3 cache

Rusu S. Stinson J. Tam S. Leung J. Muljono H. Cherkauer B. 《Solid-State Circuits, IEEE Journal of》2003,38(11):1887-1895

This 130-nm Itanium 2 processor implements the explicitly parallel instruction computing (EPIC) architecture and features an on-die 6-MB 24-way set-associative level-3 cache. The 374-mm/sup 2/ die contains 410 M transistors and is implemented in a dual-V/sub t/ process with six Cu interconnect layers and FSG dielectric. The processor runs at 1.5 GHz at 1.3 V and dissipates a maximum of 130 W. This paper reviews circuit design and package details, power delivery, the reliability, availability, and serviceability (RAS) features, design for test (DFT), and design for manufacturability (DFM) features, as well as an overview of the design and verification methodology. The fuse-based clock deskew circuit achieves 24-ps skew across the entire die, while the scan-based skew control further reduces it to 7 ps. The 128-bit front-side bus has a bandwidth of 6.4 GB/s and supports up to four processors on a single bus. 相似文献

2.

A 130-nm triple-V/sub t/ 9-MB third-level on-die cache for the 1.7-GHz Itanium/spl reg/ 2 processor

Chang J. Rusu S. Shoemaker J. Tam S. Ming Huang Haque M. Siufu Chiu Kevin Truong Karim M. Leong G. Desai K. Goe R. Kulkarni S. 《Solid-State Circuits, IEEE Journal of》2005,40(1):195-203

The 18-way set-associative, single-ported 9 MB cache for the Itanium 2 processor uses 210 identical 48-kB sub-arrays with a 2.21-/spl mu/m/sup 2/ cell in a 130-nm 6-metal technology. The processor runs at 1.7 GHz at 1.35 V and dissipates 130 W. The 432-mm/sup 2/ die contains 592 M transistors, the largest transistor count reported for a microprocessor. This paper reviews circuit design and implementation details for the L3 cache data and tag arrays. The staged mode ECC scheme avoids a latency increase in the L3 tag. A high V/sub t/ implant improves the read stability and reduces the sub-threshold leakage. 相似文献

3.

A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache

Rusu S. Tam S. Muljono H. Ayers D. Chang J. Cherkauer B. Stinson J. Benoit J. Varada R. Leung J. Limaye R. D. Vora S. 《Solid-State Circuits, IEEE Journal of》2007,42(1):17-25

This paper describes a dual-core 64-b Xeon MP processor implemented in a 65-nm eight-metal process. The 435-mm² die has 1.328-B transistors. Each core has two threads and a unified 1-MB L2 cache. The 16-MB shared, 16-way set-associative L3 cache implements both sleep and shut-off leakage reduction modes. Long channel transistors are used to reduce subthreshold leakage in cores and uncore (all portions of the die that are outside the cores) control logic. Multiple voltage and clock domains are employed to reduce power 相似文献

4.

A 9-GHz 65-nm Intel® Pentium 4 Processor Integer Execution Unit

Wijeratne S. B. Siddaiah N. Mathew S. K. Anders M. A. Krishnamurthy R. K. Anderson J. Ernest M. Nardin M. 《Solid-State Circuits, IEEE Journal of》2007,42(1):26-37

This paper describes a fourth generation Intel Pentium 4 processor integer execution core operating at 9 GHz in a 1.3-V, 65-nm CMOS technology at 70degC. Low-voltage-swing circuits of the 90-nm design are replaced by: 1) 2times frequency fast clock (FCLK)-optimized domino clocking scheme; 2) segmented arithmetic and logic unit (ALU) front-end multiplexer; 3) sparse-tree ALU adder; 4) merged add/subtract sparse-tree address generation unit (AGU) design; 5) speculative RC-delay-optimized rotator; and 6) single-rail L0 cache and alignment multiplexer, resulting in 8.4% reduction in integer core normalized active power and 42% reduction in normalized leakage power. The use of standard domino/static tools and methodologies lowers design complexity, reducing development cost and time. The redesign also reduces integer core thermal density, resulting in an 8degC reduction in CPU operating temperature 相似文献

5.

A 90-nm variable frequency clock system for a power-managed itanium architecture processor

Fischer T. Desai J. Doyle B. Naffziger S. Patella B. 《Solid-State Circuits, IEEE Journal of》2006,41(1):218-228

An Itanium Architecture microprocessor in 90-nm CMOS with 1.7B transistors implements a dynamically-variable-frequency clock system. Variable frequency clocks support a power management scheme which maximizes processor performance within a configured power envelope. Core supply voltage and clock frequency are modulated dynamically in order to remain within the power envelope. The Foxton controller and dynamically-variable clock system reside on die while the variable voltage regulator and power measurement resistors reside off chip. In addition, high-bandwidth frequency adjustment allows the clock period to adapt during on-die supply transients, allowing higher frequency processor operation during transients than possible with a single-frequency clock system. 相似文献

6.

Low standby power state storage for sub-130-nm technologies

Clark L.T. Ricci F. Biyani M. 《Solid-State Circuits, IEEE Journal of》2005,40(2):498-506

Handheld and other battery-powered ICs require process scaling to increase functional integration and reduce active power consumption. Scaling also increases leakage current components to the point where standby power is frequently a limiting design factor. A scheme combining low-leakage thick-gate shadow latches and high-performance transistors is presented that decouples performance from standby power in sub-130-nm technologies. Circuit design and operation, including pulse-clocked latches, use of dynamic circuits, and inclusion of scan is presented. The approach is validated by experimental results on a 90-nm process. 相似文献

7.

All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS

Staszewski R.B. Muhammad K. Leipold D. Chih-Ming Hung Yo-Chuol Ho Wallberg J.L. Fernando C. Maggio K. Staszewski R. Jung T. Jinseok Koh John S. Irene Yuanying Deng Sarda V. Moreira-Tamayo O. Mayega V. Katz R. Friedman O. Eliezer O.E. de-Obaldia E. Balsara P.T. 《Solid-State Circuits, IEEE Journal of》2004,39(12):2278-2291

We present a single-chip fully compliant Bluetooth radio fabricated in a digital 130-nm CMOS process. The transceiver is architectured from the ground up to be compatible with digital deep-submicron CMOS processes and be readily integrated with a digital baseband and application processor. The conventional RF frequency synthesizer architecture, based on the voltage-controlled oscillator and the phase/frequency detector and charge-pump combination, has been replaced with a digitally controlled oscillator and a time-to-digital converter, respectively. The transmitter architecture takes advantage of the wideband frequency modulation capability of the all-digital phase-locked loop with built-in automatic compensation to ensure modulation accuracy. The receiver employs a discrete-time architecture in which the RF signal is directly sampled and processed using analog and digital signal processing techniques. The complete chip also integrates power management functions and a digital baseband processor. Application of the presented ideas has resulted in significant area and power savings while producing structures that are amenable to migration to more advanced deep-submicron processes, as they become available. The entire IC occupies 10 mm/sup 2/ and consumes 28 mA during transmit and 41 mA during receive at 1.5-V supply. 相似文献

8.

A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor

Hwa-Joon Oh Mueller S.M. Jacobi C. Tran K.D. Cottier S.R. Michael B.W. Nishikawa H. Totsuka Y. Namatame T. Yano N. Machida T. Dhong S.H. 《Solid-State Circuits, IEEE Journal of》2006,41(4):759-771

The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL processor is a fully pipelined 4-way single-instruction multiple-data (SIMD) unit designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision floating-point and 16-bit integer operands with two different latencies, six-cycle and seven-cycle, with 11 FO4 delay per stage. The FPU optimizes the performance of critical single-precision multiply-add operations. Since exact rounding, exceptions, and de-norm number handling are not important to multimedia applications, IEEE correctness on the single-precision floating-point numbers is sacrificed for performance and simple design. It employs fine-grained clock gating for power saving. The design has 768K transistors in 1.3 mm/sup 2/, fabricated SOI in 90-nm technology. Correct operations have been observed up to 5.6 GHz with 1.4 V and 56/spl deg/C, delivering 44.8 GFlops. Architecture, logic, circuits, and integration are codesigned to meet the performance, power, and area goals. 相似文献

9.

The First Fully Integrated Quad-Band GSM/GPRS Receiver in a 90-nm Digital CMOS Process

《Solid-State Circuits, IEEE Journal of》2006,41(8):1772-1783

We present the receiver in the first single-chip GSM/GPRS transceiver that incorporates full integration of quad-band receiver, transmitter, memory, power management, dedicated ARM processor and RF built-in self test in a 90-nm digital CMOS process. The architecture uses Nyquist rate direct RF sampling in the receiver and an all-digital phase-locked loop (PLL) for generating the local oscillator (LO). The receive chain uses discrete-time analog signal processing to down-convert, down-sample, filter and analog-to-digital convert the received signal. A feedback loop is provided at the mixer output and can be used to cancel DC-offsets as well to study linearization of the receive chain. The receiver meets a sensitivity of$-$110 dBm at 60mA in a 1.4-V digital CMOS process in the presence of more than one million digital gates. 相似文献

10.

The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series 总被引：1，自引：0，他引：1

Chang J. Ming Huang Shoemaker J. Benoit J. Szu-Liang Chen Wei Chen Siufu Chiu Ganesan R. Leong G. Lukka V. Rusu S. Srivastava D. 《Solid-State Circuits, IEEE Journal of》2007,42(4):846-852

The 16-way set associative, single-ported 16-MB cache for the Dual-Core Intel Xeon Processor 7100 Series uses a 0.624 mum² cell in a 65-nm 8-metal technology. Low power techniques are implemented in the L3 cache to minimize both leakage and dynamic power. Sleep transistors are used in the SRAM array and peripherals, reducing the cache leakage by more than 2X. Only 0.8% of the cache is powered up for a cache access. Dynamic cache line disable (Intel Cache Safe Technology) with a history buffer protects the cache from latent defects and infant mortality failures 相似文献

11.

全固态LBO腔内倍频556 nm黄光激光器 总被引：11，自引：4，他引：7

贾富强薛庆华郑权卜轶坤谭成桥钱龙生《中国激光》2005,32(8):017-1021

全固态黄光激光器大多采用掺Nd^3＋激光晶体的^4F3/2-^4I11/2和^4F3/2-^4I13/2能级跃迁和腔内和频技术来获得，由于在输出光斑质量和功率稳定性方面一直存在较多困难，所以寻找合适的基频光谱线同时利用腔内倍频是一种切实可行的解决方案。通过对Nd：YAG激光谱线分析以后发现^4F3/2-^4I11/2这两个能级间部分激光谱线（1112nm，1116nm，1123nm）经过倍频以后正好可以获得黄光激光输出。通过对Nd：YAG各主要谱线激光参量比较和分析后发现，要想获得增益较低激光谱线1112nm，1116nm，1123nm振荡，可以通过镀制特殊要求的谐振腔膜抑制增益较大的1064nm，1319nm，946nm激光谱线运转来实现。通过对谐振腔膜系的设计以及倍频晶体的合理选择和放置，采用LBO晶体腔内倍频，利用2W的激光二极管（LD）抽运Nd：YAG，获得了556nm黄光激光输出，在1．6W的抽运功率下，最大输出功率为102mW，光-光转换效率为6．4％。相似文献

12.

A 90-nm CMOS Low-Power GSM/EDGE Multimedia-Enhanced Baseband Processor With 380-MHz ARM926 Core and Mixed-Signal Extensions

Lueftner T. Berthold J. Pacha C. Georgakos G. Sauzon G. Hoemke O. Beshenar J. Mahrla P. Just K. Hober P. Henzler S. Schmitt-Landsiedel D. Yakovleff A. Klein A. Knight R. J. Acharya P. Bonnardot A. Buch S. Sauer M. 《Solid-State Circuits, IEEE Journal of》2007,42(1):134-144

To meet the widely varying speed and power requirements of multifunctional mobile devices, an appropriate combination of technology features, circuit-level low-power techniques, and system architecture is implemented in a GSM/Edge baseband processor with multimedia and mixed-signal extensions. Power reduction techniques and performance requirements are derived from an analysis of relevant use cases and applications. The 44 mm² baseband processor is fabricated in a 90-nm low-power CMOS technology with triple-well option and dual-gate oxide core devices. The ARM926 core achieves a maximum clock frequency of 380 MHz at 1.4-V supply due to the usage of thin oxide (1.6 nm) devices. Power dissipation can be adapted to the performance requirements by means of combined voltage and frequency scaling to reduce active power consumption in medium-performance mode by 68%. To reduce leakage currents during standby mode, large SRAM blocks, nFET sleep transistors, and circuit components with relaxed performance requirements are implemented using devices with 2.2-nm gate oxide thickness 相似文献

13.

A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core

Mathew S. Anders M. Krishnamurthy R.K. Borkar S. 《Solid-State Circuits, IEEE Journal of》2003,38(5):689-695

This paper describes a 32-bit address generation unit designed for 4-GHz operation in 1.2-V 130-nm technology. The AGU utilizes a 152-ps sparse-tree adder core to achieve 20% delay reduction, 80% lower interconnect complexity, and a low (1%) active energy leakage component. The dual-V/sub T/ semidynamic implementation of the adder core provides the performance of a dynamic CMOS design with an average energy profile similar to static CMOS, enabling 71% savings in average energy with a good sub-130-nm scaling trend. 相似文献

14.

Clock generation and distribution for the 130-nm Itanium/sup /spl reg// 2 processor with 6-MB on-die L3 cache

Tam S. Limaye R.D. Desai U.N. 《Solid-State Circuits, IEEE Journal of》2004,39(4):636-642

The clock generation and distribution system for the 130-nm Itanium 2 processor operates at 1.5 GHz with a skew of 24 ps. The Itanium 2 processor features 6 MB of on-die L3 cache and has a die size of 374 mm/sup 2/. Fuse-based clock de-skew enables post-silicon clock optimization to gain higher frequency. This paper describes the clock generation, global clock distribution, local clocking, and the clock skew optimization feature. 相似文献

15.

A Case Study: Power and Performance Improvement of a Chip Multiprocessor for Transaction Processing

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(7):865-868

Current high-end microprocessor designs focus on increasing instruction parallelism and clock frequency at the expense of power dissipation. This paper presents a case study of a different direction, a chip multiprocessor (CMP) with a smaller processor core than a baseline high-end 130-nm 64-bit SPARC server uniprocessor. We demonstrate that the size of the baseline processor core can be reduced by 2/3 using a combination of logical resource reduction and dense custom macros while still delivering about 70% of the TPC-C performance. Circuit speed is traded for power reduction by reducing the power supply from 1.0 to 0.8 V and increasing transistor channel lengths by 12.5% above the minimum. The resulting CMP with six reduced size cores and 4-MB L2 cache is estimated to run at 1.8 GHz while consuming less than 30% of the power compared to the scaled baseline dual-core processor running at 2.4 GHz. The proposed CMP is more than four times higher in TPC/W than the dual-core processor, facilitating the design of high-density servers. 相似文献

16.

A power efficient 26-GHz 32:1 static frequency divider in 130-nm bulk CMOS 总被引：2，自引：0，他引：2

Changhua Cao O K.K. 《Microwave and Wireless Components Letters, IEEE》2005,15(11):721-723

A 32:1 static frequency divider consisting of five stages of 2:1 dividers using current mode logic (CML) was fabricated in a 130-nm bulk complementary metal-oxide semiconductor (CMOS) logic process. By optimizing transistors size, high operating speed is achieved with limited power consumption. For an input power of 0dBm, the 32:1 divider operates up to 26GHz with a 1.5-V supply voltage. The whole 32:1 chain including buffers consumes 8.97mW and the first stage consumes only 3.88mW at a 26-GHz operation. The power consumption of the first 2:1 stage is less than 15% of other bulk CMOS static frequency dividers operating at the same frequency. 相似文献

17.

A low-power 2.5-GHz 90-nm level 1 cache and memory management unit

Haigh J.R. Wilkerson M.W. Miller J.B. Beatty T.S. Strazdus S.J. Clark L.T. 《Solid-State Circuits, IEEE Journal of》2005,40(5):1190-1199

The design of a 90-nm virtually addressed cache subsystem with separate 32-kB instruction and data caches is described. The circuits and microarchitecture are illustrated, including architecture level trace data validating low-power features and provisions to support snooping while maintaining the latency and power of virtual addressing. Low-power memory management unit design including a translation lookaside buffer with process identifier mapping is also described. Level 1 caches with support for high bandwidth, single cycle 256 bit fill and evict, as well as features for low power are also described. The design approaches are validated through both simulation and experimental results. 相似文献

18.

A sub-130-nm conditional keeper technique

Alvandpour A. Krishnamurthy R.K. Soumyanath K. Borkar S.Y. 《Solid-State Circuits, IEEE Journal of》2002,37(5):633-638

Increasing leakage currents combined with reduced noise margins significantly degrade the robustness of wide dynamic circuits. In this paper, we describe two conditional keeper topologies for improving the robustness of sub-130-nm wide dynamic circuits. They are applicable in normal mode of operation as well as during burn-in test. A large fraction of the keepers is activated conditionally, allowing the use of strong keepers with leaky precharged circuits without significant impact on performance of the circuits. Compared to conventional techniques, up to 28% higher performance has been observed for wide dynamic gates in a 130-nm technology. In addition, the proposed burn-in keeper results in 64% active area reduction 相似文献

19.

155-nm Continuous-Wave Two-Pump Parametric Amplification

Boggio J. Moro S. Myslivets E. Windmiller J.R. Alic N. Radic S. 《Photonics Technology Letters, IEEE》2009,21(10):612-614

We investigate the synthesis of flat parametric response of a dual-pumped device with a distant pump separation ranging from 130 to 180 nm. The Raman contribution to the nonlinear polarization introduced predictable gain ripple and dispersion fluctuation along the highly nonlinear fiber had to be precisely accounted for. A 3-dB equalized gain was observed over 100 nm using 130-nm separated pumps. Record gain bandwidth of 155 nm was also measured for the first time. 相似文献

20.

Pump interactions in a 100-nm bandwidth Raman amplifier 总被引：11，自引：0，他引：11

Kidorf H. Rottwitt K. Nissov M. Ma M. Rabarijaona E. 《Photonics Technology Letters, IEEE》1999,11(5):530-532

A design for a 100-nm bandwidth Raman amplifier is presented. The amplifier is pumped with eight, 130-mW lasers with wavelengths ranging from 1416 to 1502 nm. The peak-to-peak gain ripple is 1.1 dB. A new model was developed for this design that includes pump-to-pump and signal-to-signal interactions in addition to double Rayleigh scattering and amplified spontaneous emission. An understanding of the interactions among these various effects was essential to this design. These modeling results are based on measurements of the physical characteristics of the transmission fiber 相似文献