期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A high-speed multiplier using a redundant binary adder tree

《Solid-State Circuits, IEEE Journal of》1987,22(1):28-34

A 16-bit /spl times/ 16-bit multiplier for 2 two's-complement binary numbers based on a new algorithm is described. This multiplier has been fabricated on an LSI chip using a standard n-E/D MOS process technology with a 2.7-/spl mu/m design rule. This multiplier is characterized by use of a binary tree of redundant binary adders. In the new algorithm, n-bit multiplication is performed in a time proportional to log/SUB 2/ n and the physical design of the multiplier is constructed of a regular cellular array. This new algorithm has been proposed by N. Takagi et al. (1982, 1983). The 16-bit/spl times/16-bit multiplier chip size is 5.8 /spl times/ 6.3 mm/SUP 2/ using the new layout for a binary adder tree. The chip contains about 10600 transistors, and the longest logic path includes 46 gates. The multiplication time was measured as 120 ns. It is estimated that a 32-bit /spl times/ 32-bit multiplication time is about 140 ns. 相似文献

2.

A sub-10-ns 16/spl times/16 multiplier using 0.6-/spl mu/m CMOS technology

《Solid-State Circuits, IEEE Journal of》1987,22(5):762-767

A 16/spl times/16-b parallel multiplier fabricated in a 0.6-/spl mu/m CMOS technology is described. The chip uses a modified array scheme incorporated with a Booth's algorithm to reduce the number of adding stages of partial products. The combination of scaled 0.6-/spl mu/m CMOS technology and advanced arithmetic architecture achieves a multiplication time of 7.4 ns while dissipating only 400 mW. This multiplication time is shorter than other MOS high-speed multipliers previously reported and is comparable to those for advanced bipolar and GaAs multipliers. 相似文献

3.

A fast 16 bit NMOS parallel multiplier

《Solid-State Circuits, IEEE Journal of》1984,19(3):338-342

An architecture for a fast parallel array multiplier is described. Using a 3 /spl mu/m E/D NMOS process, a 16/spl times/16 bit trial circuit has been designed. A multiplication time of 120 ns has been achieved with a power dissipation of 200 mW and a silicon area of 5 mm/SUP 2/. This architecture concept greatly reduces the logical depth of the array by rearranging internal delays. It is applicable in principle to any MOS, CMOS, GaAs, or bipolar technology. 相似文献

4.

Efficient algorithm and systolic architecture for modular division

Chuanpeng Chen 《International Journal of Electronics》2013,100(6):813-823

A new efficient modular division algorithm suitable for systolic implementation and its systolic architecture is proposed in this article. With a new exit condition of while loop and a new updating method of a control variable, the new algorithm reduces the average of iteration numbers by more than 14.3% compared to the algorithm proposed by Chen, Bai and Chen. Based on the new algorithm, we design a fast systolic architecture with an optimised core computing cell. Compared to the architecture proposed by Chen, Bai and Chen, our systolic architecture has reduced the critical path delay by about 18% and the total computational time for one modular division by almost 30%, with the cost of about 1% more cells. Moreover, by the addition of a flag signal and three logic gates, the proposed systolic architecture can also perform Montgomery modular multiplication and a fast unified modular divider/multiplier is realised. 相似文献

5.

Low-Energy Digit-Serial/Parallel Finite Field Multipliers 总被引：5，自引：0，他引：5

Leilei Song Keshab K. Parhi 《The Journal of VLSI Signal Processing》1998,19(2):149-166

Digit-serial architectures are best suited for systems requiring moderate sample rate and where area and power consumption are critical. This paper presents a new approach for designing digit-serial/parallel finite field multipliers. This approach combines both array-type and parallel multiplication algorithms, where the digit-level array-type algorithm minimizes the latency for one multiplication operation and the parallel architecture inside of each digit cell reduces both the cycle-time as well as the switching activities, hence power consumption. By appropriately constraining the feasible primitive polynomials, the mod p(x) operation involved in finite field multiplication can be performed in a more efficient way. As a result, the computation delay and energy consumption of one finite field multiplication using the proposed digit-serial/parallel architectures are significantly less than of those obtained by folding the parallel semi-systolic multipliers. Furthermore, their energy-delay products are reduced by a even larger percentage. Therefore, the proposed digit-serial/parallel architectures are attractive for both low-energy and high-performance applications. 相似文献

6.

CMOS four-quadrant current multiplier using switched current techniques 总被引：2，自引：0，他引：2

Akl Y. El-Sayed M. Aboul-Seoud A.K. 《Electronics letters》2004,40(6):359-360

A new CMOS four-quadrant switched current multiplier, operating from a single 3V power supply and employing two-phase clocking scheme, is proposed. The circuit is designed to perform one multiplication per clock cycle. SPICE simulations using 0.5 /spl mu/m CMOS process parameters have been carried out to verify the multiplier performance. 相似文献

7.

CMOS image sensor with mixed-signal processor array

Graupner A. Schreiter J. Getzlaff S. Schuffny R. 《Solid-State Circuits, IEEE Journal of》2003,38(6):948-957

We present a single-chip integration of a CMOS image sensor with an embedded flexible processing array and dedicated analog-to-digital converter. The processor array is designed to perform convolution and transformation algorithms with arbitrary kernels. It has been designed to carry out the multiplication of analog image data with given digital kernel coefficients and to add up the results. The processor array is an analog implementation of a highly parallel architecture which is scalable to any desired sensor resolution while preserving video-rate operation. A prototype implementation has been realized in a 0.6-/spl mu/m CMOS technology. Switched current technique has been applied to obtain compact and robust circuits. The prototype's sensor resolution is 64 /spl times/ 128 pixels. The processor array occupies a small chip area and consumes only a small percentage of the power (250 /spl mu/W) of the whole image sensor. 相似文献

8.

Bit-level pipelined digit-serial multiplier

A. AGGOUN A. ASHUR M. K. IBRAHIM 《International Journal of Electronics》2013,100(6):1209-1219

A new cell architecture for high performance digit-serial computation is presented. The design of this cell is based on the feed forward of the carry digit, which allows a high level of pipelining to increase the throughput rate with minimum latency. This will give designers greater flexibility in finding the best trade-off between hardware cost and throughput rate. A twin-pipe architecture to double the throughput rate of digit-serial/parallel multipliers is also presented. The effects of the number of pipelining levels and the twin architecture on the throughput rate and hardware cost are presented. A two's complement digit-serial/parallel multiplier which can operate on both negative and positive numbers is also presented. 相似文献

9.

High speed multiplier using Nikhilam Sutra algorithm of Vedic mathematics

Manoranjan Pradhan Rutuparna Panda 《International Journal of Electronics》2013,100(3):300-307

This article presents the design of a new high-speed multiplier architecture using Nikhilam Sutra of Vedic mathematics. The proposed multiplier architecture finds out the compliment of the large operand from its nearest base to perform the multiplication. The multiplication of two large operands is reduced to the multiplication of their compliments and addition. It is more efficient when the magnitudes of both operands are more than half of their maximum values. The carry save adder in the multiplier architecture increases the speed of addition of partial products. The multiplier circuit is synthesised and simulated using Xilinx ISE 10.1 software and implemented on Spartan 2 FPGA device XC2S30-5pq208. The output parameters such as propagation delay and device utilisation are calculated from synthesis results. The performance evaluation results in terms of speed and device utilisation are compared with earlier multiplier architecture. The proposed design has speed improvements compared to multiplier architecture presented in the literature. 相似文献

10.

High-Speed Array Multipliers Based on On-the-Fly Conversion

Sang-Man Moh Suk-Han Yoon 《ETRI Journal》1997,19(4):317-325

A new on-the-fly conversion algorithm is proposed, and high-speed array multipliers with the on-the-fly conversion are presented. The new on-the-fly conversion logic is used to speed up carry-propagate addition at the last stage of multiplication, and provides constant delay independent of the number of input bits. In this paper, the multiplication architecture and the on-the-fly conversion algorithm are presented and discussed in detail. The proposed architecture has multiplication time of (n + 1)t_FA, where n is the number of input bits and t_FA is the delay of a full adder. According to our comparative performance evaluation, the proposed architecture has shorter delay and requires less area than the conventional array multiplier with on-the-fly conversion. 相似文献

11.

On Parallelization of High-Speed Processors for Elliptic Curve Cryptography

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(9):1162-1175

This paper discusses parallelization of elliptic curve cryptography hardware accelerators using elliptic curves over binary fields $BBF_{2^{m}}$. Elliptic curve point multiplication, which is the operation used in every elliptic curve cryptosystem, is hierarchical in nature, and parallelism can be utilized in different hierarchy levels as shown in many publications. However, a comprehensive analysis on the effects of parallelization has not been previously presented. This paper provides tools for evaluating the use of parallelism and shows where it should be used in order to maximize efficiency. Special attention is given for a family of curves called Koblitz curves because they offer very efficient point multiplication. A new method where the latency of point multiplication is reduced with parallel field arithmetic processors is introduced. It is shown to outperform the previously presented multiple field multiplier techniques in the cases of Koblitz curves and generic curves with fixed base points. A highly efficient general elliptic curve cryptography processor architecture is presented and analyzed. Based on this architecture and analysis on the effects of parallelization, a few designs are implemented on an Altera Stratix II field-programmable gate array (FPGA). 相似文献

12.

On-chip high-voltage generation in MNOS integrated circuits using an improved voltage multiplier technique 总被引：5，自引：0，他引：5

《Solid-State Circuits, IEEE Journal of》1976,11(3):374-378

An improved voltage multiplier technique has been developed for generating +40 V internally in p-channel MNOS integrated circuits to enable them to be operated from standard +5- and -12-V supply rails. With this technique, the multiplication efficiency and current driving capability are both independent of the number of multiplier stages. A mathematical model and simple equivalent circuit have been developed for the multiplier and the predicted performance agrees well with measured results. A multiplier has already been incorporated into a TTL compatible nonvolatile quad-latch, in which it occupies a chip area of 600 /spl mu/m/spl times/240 /spl mu/m. It is operated with a clock frequency of 1 MHz and can supply a maximum load current of about 10 /spl mu/A. The output impedance is 3.2 M/spl Omega/. 相似文献

13.

A fast multispeed comma-free Reed-Solomon decoder for W-CDMA applications using foldable systolic array architecture

Chi-Fang Li Wern-Ho Sheen Chong-Ren Wang Yuan-Sun Chu 《Solid-State Circuits, IEEE Journal of》2003,38(4):677-682

This brief proposes a fast multispeed comma-free Reed-Solomon (CFRS) decoder for the frame synchronization and code-group identification in the cell search of the Third Generation Partnership Project wide-band code-division multiple access/frequency division duplexing (W-CDMA/FDD) system. A foldable systolic array is proposed to achieve fast decoding and provide flexible tradeoffs between power consumption, chip size, and decoding latency. Multispeed decoding, an idea that is useful for cell search in different application scenarios, can also be achieved with the same array architecture. The proposed CFRS decoder is implemented in a 3.3-V 0.35-/spl mu/m CMOS technology with 2.2 /spl times/ 2.2 mm/sup 2/ core area and power dissipation of 13.3 and 1.23 mW in high- and low-speed decoding modes, respectively. 相似文献

14.

Reduced computational redundancy implementation of DSP algorithms using computation sharing vector scaling

Muhammad K. Roy K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2002,10(3):292-300

In this paper, we present a general approach which specifically targets reduction of redundant computation in common digital-signal processing (DSP) tasks such as filtering and matrix multiplication. We show that such tasks can be expressed as multiplication of vectors by scalars and this allows fast multiplication by sharing computation. Vector scaling operation is decomposed to find the most effective precomputations which yield a fast multiplier implementation. Two decomposition approaches are presented, one based on a greedy decomposition and the other based on fixed-size lookup and this leads to two multiplier architectures for vector-scalar products. Analog simulation of an example multiplier shows a speed advantage by a factor of about 1.85 over a conventional carry save array multiplier. Further simulations using 0.18 /spl mu/ technology show up to 20% speed advantage over Booth encoded Wallace tree multipliers. 相似文献

15.

An efficient and high-speed VLSI implementation of optimal normal basis multiplication over GF(2m)

《Integration, the VLSI Journal》2016

Finite field multiplication is one of the most important operations in the finite field arithmetic and the main and determining building block in terms of overall speed and area in public key cryptosystems. In this work, an efficient and high-speed VLSI implementation of the bit-serial, digit-serial and bit-parallel optimal normal basis multipliers with parallel-input serial-output (PISO) and parallel-input parallel-output (PIPO) structures are presented. Two general multipliers, namely, Massey–Omura (MO) and Reyhani Masoleh–Hassan (RMH) are considered as case study for implementation. These multipliers are constructed by using AND, XOR–AND and XOR tree components. In the MO multiplier, to have strong input signals and have a better implementation, the row of AND gates are implemented by using inverter and NOR components. Also the XOR–AND component in the RMH structure is implemented using a new low-cost structure. The XOR tree in both multipliers consists of a high number of logic stages and many inputs; therefore, to optimally decrease the delay and increase the drive ability of the circuit for different loads, the logical effort method is employed as an efficient method for sizing the transistors. The multipliers are first designed for different load capacitances using different structures and different number of stages. Then using the logical effort method and a new proposed 4-input XOR gate structure, the circuits are modified for acquiring minimum delay. Using 0.18 μm CMOS technology, the bit-serial, digit-serial and bit-parallel structures with type-1 and type-2 optimal normal basis are implemented over the finite fields GF(2²²⁶) and GF(2²³³) respectively. The results show that the proposed structures have better delay and area characteristics compared to previous designs. 相似文献

16.

A low logic depth complex multiplier using distributed arithmetic

Berkeman A. Owall V. Torkelson M. 《Solid-State Circuits, IEEE Journal of》2000,35(4):656-659

A combinatorial complex multiplier has been designed for use in a pipelined fast Fourier transform processor. The performance in terms of throughput of the processor is limited by the multiplication. Therefore, the multiplier is optimized to make the input-to-output delay as short as possible. A new architecture based on distributed arithmetic, Wallace-trees, and carry-lookahead adders has been developed. The multiplier has been fabricated using standard cells in a 0.5-μm process and verified for functionality, speed, and power consumption. Running at 40 MHz, a multiplier with input wordlengths of 16+16 times 10+10 bits consumes 54% less power compared to an distributed arithmetic array multiplier fabricated under equal conditions 相似文献

17.

Design and application of a 2500-gate bipolar macrocell array

《Solid-State Circuits, IEEE Journal of》1985,20(5):1025-1031

A very high-speed 2500-gate Si bipolar macrocell array has been developed using a novel macrocell design approach and a 1-/spl mu/m rule advanced super self-aligned process technology (SST-1A). Using this macrocell array, a 16-bit parallel multiplier is designed and fabricated. The sophisticated circuit design of the macrocell array approach permits this complex function, which is equivalent to having 3024 NOR gates, using only 70% of the total of 756 internal cells. Consequently, a fast multiplication time of 7.5 ns is achieved with a 2.07-W power dissipation. Excellent performance with an average gate delay of 120 ps and average power dissipation of 0.365 mW is demonstrated for an equivalent NOR gate. 相似文献

18.

Three hardware architectures for the binary modular exponentiation: sequential, parallel, and systolic

Nedjah N. Mourelle Ld.M. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(3):627-633

Modular exponentiation is the cornerstone computation in public-key cryptography systems such as RSA cryptosystems. The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field-programmable gate array (FPGA) prototype has a sequential architecture, the second has a parallel architecture, and the third has a systolic array-based architecture. The paper compares the three prototypes as well as Blum and Paar's implementation using the time /spl times/ area classic factor. All three prototypes implement the modular multiplication using the popular Montgomery algorithm. 相似文献

19.

Systolic and Non-Systolic Scalable Modular Designs of Finite Field Multipliers for Reed–Solomon Codec

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(6):747-757

In this paper, we present efficient algorithms for modular reduction to derive novel systolic and non-systolic architectures for polynomial basis finite field multipliers over $GF(2^{m})$ to be used in Reed–Solomon (RS) codec. Using the proposed algorithm for unit degree reduction and optimization of implementation of the logic functions in the processing elements (PEs), we have derived an efficient bit-parallel systolic design for finite field multiplier which involves nearly two-thirds of the area-complexity of the existing design having the same time-complexity. The proposed modular reduction algorithms are also used to derive efficient non-systolic serial/parallel designs of field multipliers over $GF(2^{8})$ with different digit-sizes, where the critical path and the hardware-complexity are further reduced by optimizing the implementation of modular reduction operations and finite field accumulations. The proposed bit-serial design involves nearly 55% of the minimum of area, and half the minimum of area-time complexity of the existing bit-serial designs. Similarly, the proposed digit-serial/parallel designs involve significantly less area, and less area-time complexities compared with the existing designs of the same digit-size. By parallel modular reduction through multiple degrees followed by appropriate logic-level sub-expression sharing; a hardware-efficient regular and modular form of a balanced-tree bit-parallel non-systolic multiplier is also derived. The proposed bit-parallel non-systolic pipelined design involves less than 65% of the area and nearly two-thirds of the area-time complexity of the existing bit-parallel design for a RS codec, while the non-pipelined form offers nearly 25% saving of area with less time-complexity. 相似文献

20.

SIGMA: a VLSI systolic array implementation of a Galois field GF(2^m) based multiplication and division algorithm

Kovac M. Ranganathan N. Varanasi M. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(1):22-30

Finite or Galois fields are used in numerous applications like error correcting codes, digital signal processing and cryptography. The design of efficient methods for Galois field arithmetic such as multiplication and division is critical for these applications. A new algorithm based on a pattern matching technique for computing multiplication and division in GF(2^m) is presented. An efficient systolic architecture is described for implementing the algorithm which can produce a new result every clock cycle and the multiplication and division operations can be interleaved. The architecture has been implemented using 2-μm CMOS technology. The chip yields a computational rate of 33.3 million multiplications/divisions per second 相似文献