首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
《Real》2000,6(4):297-312
This paper presents a VLSI implementation of One Dimensional Direct Discrete Wavelet transform (1-D DWT). The DDWT can be viewed as a multi-resolution decomposition of a signal. This means that it decomposes a signal into its components in different frequency bands (octave bands). We propose a new architecture using parallel filters. We consider the implementation of 1-D three levels DWT. The proposed architecture is simple and offers 16-bit precision on input and output data. It is constituted of three basic units: one register bank, four filters, and a control unit. The filters are of different lengths and with new coefficients derived from Daubechies filter coefficients. The designed processor architecture requires no interface circuitry for interconnection to a standard communication bus. The architecture can compute DWT at a data rate of 12×106samples/s corresponding to a typical clock speed of 12 MHz. The architecture is simulated at the gate level in VLSI.  相似文献   

2.
Kerneltron: support vector "machine" in silicon.   总被引:1,自引:0,他引:1  
Detection of complex objects in streaming video poses two fundamental challenges: training from sparse data with proper generalization across variations in the object class and the environment; and the computational power required of the trained classifier running real-time. The Kerneltron supports the generalization performance of a support vector machine (SVM) and offers the bandwidth and efficiency of a massively parallel architecture. The mixed-signal very large-scale integration (VLSI) processor is dedicated to the most intensive of SVM operations: evaluating a kernel over large numbers of vectors in high dimensions. At the core of the Kerneltron is an internally analog, fine-grain computational array performing externally digital inner-products between an incoming vector and each of the stored support vectors. The three-transistor unit cell in the array combines single-bit dynamic storage, binary multiplication, and zero-latency analog accumulation. Precise digital outputs are obtained through oversampled quantization of the analog array outputs combined with bit-serial unary encoding of the digital inputs. The 256 input, 128 vector Kerneltron measures 3 mm/spl times/3mm in 0.5 /spl mu/m CMOS, delivers 6.5 GMACS throughput at 5.9 mW power, and attains 8-bit output resolution.  相似文献   

3.
Recently, non-volatile memory-based computing-in-memory has been regarded as a promising competitor to ultra-low-power AI chips. Implementations based on both binarized (BIN) and multi-bit (MB) schemes are proposed for DNNs/CNNs. However, there are challenges in accuracy and power efficiency in the practical use of both schemes. This paper proposes a hybrid precision architecture and circuit-level techniques to overcome these challenges. According to measured experimental results, a test chip based on the proposed architecture achieves (1) from binarized weights and inputs up to 8-bit input, 5-bit weight, and 7-bit output, (2) an accuracy loss reduction of from 86% to 96% for multiple complex CNNs, and (3) a power efficiency of 2.15TOPS/W based on a 0.22μm CMOS process which greatly reduces costs compared to digital designs with similar power efficiency. With a more advanced process, the architecture can achieve a higher power efficiency. According to our estimation, a power efficiency of over 20TOPS/W can be achieved with a 55nm CMOS process.  相似文献   

4.
异步电路能很好地解决同步集成电路设计中出现的时钟扭曲和时钟功耗过大等问题。本文采用异步集成电路设计方法设计了一款32位异步子字并行乘累加单元,并在0.18μm工艺条件下实现了该单元。通过使用特殊的部分积译码电路,该乘累加单元能支持多种子字并行模式,适用于多媒体处理。评测结果表明,异步乘累加单元的性能和功耗指标均优于采用同样结构的同步乘累加单元。  相似文献   

5.
A new parallel chaotic Hash function, based on four-dimensional cellular neural network, is proposed in this paper. The message is expanded by iterating chaotic logistic map and then divided into blocks with a length of 512 bits each. All blocks are processed in a parallel mode, which is one of the significant characteristics of the proposed algorithm. Each 512-bit block is divided into four 128-bit sub-blocks, each of which is further separated into four 32-bit values and then the four values are mixed into four new values generated by chaotic cat map. The obtained four new values are performed by the bit-wise exclusive OR operation with four initial values or previously generated four values, and then, they are used as the inputs of cellular neural network. By iterating cellular neural network, another four values as the middle Hash value are generated. The generated values of all blocks are inputted into the compression function to produce the final 128-bit Hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm satisfies the requirements of a secure Hash function.  相似文献   

6.
This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9?ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architecture.  相似文献   

7.
8.

Comparator is an essential building block in many digital circuits such as biometric authentication, data sorting, and exponents comparison in floating-point architectures among others. Quantum-dot Cellular Automata (QCA) is a latest nanotechnology that overcomes the drawbacks of Complementary Metal Oxide Semiconductor (CMOS) technology. In this paper, novel area optimized 2n-bit comparator architecture is proposed. To achieve the objective, 1-bit stack-type and 4-bit tree-based stack-type (TB-ST) comparators are proposed using QCA. Then, two tree-based architectures of 4-bit comparators are arranged in two layers to optimize the number of quantum cells and area of an 8-bit comparator. Thus, this design can be extended to any 2n-bit comparator. Simulation results of 4-bit and 8-bit comparators using QCADesigner 2.0.3 show that there is a significant improvement in the number of quantum cells and area occupancy. The proposed TB-ST 8-bit comparator uses 2.5 clock cycles and 622 quantum cells with area occupancy of 0.49 µm2 which is an improvement by 10.5% and 38%, respectively, compared to existing designs. Scaling it to a 32-bit comparator, the proposed architecture requires only 2675 quantum cells in an area of 2.05 µm2 with a delay of 3.5 clock cycles, indicating 9.35% and 28.8% improvements, respectively, demonstrating the merit of the proposed architecture. Besides, energy dissipation analysis of the proposed TB-ST 8-bit comparator is simulated on QCADesigner-E tool, indicating average energy dissipation reduction of 17.3% compared to existing works.

  相似文献   

9.
YHFT-D4是一款具有分簇的VLIW体系结构的DSP,它有多个功能单元,可在单个时钟周期并行地执行多条指令。指令执行的功能单元是哪个,哪些指令并行执行,这些由编译器或程序员静态决定,文章给出了YHFT-D4汇编器的设计和实现方法。  相似文献   

10.
As semiconductor technology advances, more devices can be accommodated in a single VLSI chip. The feasibility of putting multifunctional units in a chip is then worth studying. Such an approach, however, will face software and hardware difficulties (and also tradeoffs). CRISC is a 32-bit single-chip VLSI processor architecture achieving high performance by means of RISC and multiple functional unit approaches. Dual-ALUs are used to execute instructions concurrently for fine-grained parallelism. Up to three instructions can be executed simultaneously by CRISC. Here, CRISC architecture design considerations and instruction cache scheme are investigated. Final microarchitecture and its incorporated software technique to produce object code for fine-grained parallel execution are described; its upper bound performance is estimated by an architectural model. A preliminary evaluation of the CRISC is also conducted, showing most satisfying results.  相似文献   

11.
In this work, a reversible single precision floating-point square root is proposed using modified non-restoring algorithm. To our knowledge, this is the first work proposed for floating-point square root using reversible logic. The main block involved in the implementation of reversible square root using modified non-restoring technique is Reversible Controlled-Subtract-Multiplex. Further, optimized Reversible Controlled-Subtract-Multiplex blocks are introduced in order to minimize the number of reversible gates used, number of constant inputs used, number of garbage outputs produced as well as the quantum cost. The proposed reversible single precision floating-point square root is realized using an 8-bit reversible adder, an 8-bit and a 25-bit reversible shift register, 12-bit reversible unsigned square root, 6-bit reversible unsigned square root, 4-bit reversible unsigned square root, 3-bit reversible unsigned square root and ten 1-bit reversible unsigned square root units.  相似文献   

12.
We introduce a mechanism for constructing and training a hybrid architecture of projection-based units and radial basis functions. In particular, we introduce an optimisation scheme which includes several steps and assures a convergence to a useful solution. During network architecture construction and training, it is determined whether a unit should be removed or replaced. The resulting architecture often has a smaller number of units compared with competing architectures. A specific overfitting resulting from shrinkage of the RBF radii is addressed by introducing a penalty on small radii. Classification and regression results are demonstrated on various benchmark data sets and compared with several variants of RBF networks [1,2]. A striking performance improvement is achieved on the vowel data set [3]. Received: 03 November 2000, Received in revised form: 25 October 2001, Accepted: 04 January 2002  相似文献   

13.
The paper describes an open architecture microcontroller based distributed measurement and control system with automatic generation of application program. Interpretation of functions and generation of program for control of the newly added distributed unit or distributed unit of a new type connected to the system performs automatically, without user assistance. The elements of the system are interconnected by means of a serial common bus according to the reduced OSI protocol. The proposed concept was tested in a system developed by using 8-bit Atmel microcontrollers of 89S and 89C series. Apart from the central unit, intelligent distributed units were developed for the control of a stepper motor, programmable linear movement, control of halogen lamps, acquisition and generation of analogue, digital and timing pulses and a real time clock (RTC).  相似文献   

14.
This paper presents a dynamically scheduled parallel DSP architecture for general purpose DSP computations. The architecture consists of multiple DSP processors and of one or more scheduling units. DSP applications are first captured by stream flow graphs, and then stream flow graphs are statically mapped onto a parallel architecture. The ordering and starting time of DSP tasks are determined by the scheduling unit(s) using a dynamic scheduling algorithm.The main contributions of this paper are summarized as follows:• A scalable parallel DSP architecture: The parallel DSP architecture proposed in this paper is scalable to meet signal processing requirements. For parallel DSP architectures with large configurations, the scheduling unit may become a performance bottleneck. A distributed scheduling mechanism is proposed to address this problem.• A mapping algorithm: An algorithm is proposed to systematically map a stream flow graph onto a parallel DSP architecture.• A dynamic scheduling algorithm: We propose a dynamic scheduling algorithm that will only schedule a node for execution when both input data and output storage space are available. Such scheduling algorithm will allow buffer sizes to be determined at compile time.• A simulation study: Our simulation study reveals the relationships among the grain-size, the processor utilization, and the scheduling capability. We believe these relationships have significant impact on parallel computer architecture design involving dynamic scheduling.  相似文献   

15.
The WEDSP32C high-performance, programmable digital signal processor supports 32-bit floating-point arithmetic and is upwardly compatible with its predecessor, the WEDSP32. Because it is implemented in 0.75-μm (effective channel length) CMOS technology, the second-generation device achieves high functional density with low power consumption. The DSP32C offers the following features: 25-Mflop operation; 16-Mb/s serial-input and serial-output ports; a 160-bit, parallel I/O port for control and data transfer; interrupt facilities; single-instruction μ-law and A-law data conversions; single-instruction conversions between integers and floating-point data; a byte-addressable, on-chip memory that is extendable off chip; direct memory access to and from internal and external memory via parallel and serial I/O ports; 16 Mbytes of address space; and IEEE Std. 754 floating-point format conversion. The authors describe the DSP32C's instruction set, architecture, and application development tools. The latter includes an assembler, a simulator, an optimizing C compiler, and special-purpose hardware  相似文献   

16.
Recursive digital filters (RDFs) are one of the most commonly used methods of baseflow separation. However, how accurately they estimate baseflow and how to select appropriate values of filter parameters is generally unknown. In this paper, the output of fully integrated surface water/groundwater (SW/GW) models is used to obtain optimal parameters for, and assess the accuracy of, three commonly used RDFs under a range of physical catchment characteristics and hydrological inputs. The results indicate that the Lyne and Hollick (LH) filter performs better than the Boughton and Eckhardt filters, over a larger range of conditions. In addition, the optimal values of the filter parameters vary considerably for all three filters, depending on catchment characteristics and hydrological inputs. The dataset of the 66 catchment characteristics and hydrological inputs, as well as the corresponding simulated total streamflow and baseflow hydrographs obtained using the SW/GW model, can be downloaded as Supplementary material.  相似文献   

17.
基于ARM的DeviceNet一体化适配器开发   总被引:1,自引:0,他引:1  
从现场通用的老式RS232串口与新兴DeviceNet网络的兼容问题为出发点,以Atmel的32位ARM7高速处理器为开发平台,充分发挥其处理高速和功能多样的优势,同时结合DeviceNet现场总线高效和诊断的优点,开发了一个带8路数字量输入,8路数字量输出,4路模拟量输入和RS232串口的DevciceNet一体化通信适配器.特别从系统硬件开发和软件开发两方面加以阐述,并结合OMRON PLC主站测试系统,最终成功给予测试.  相似文献   

18.
Block Cipher SEED is one of the standard 128-bit block ciphers of ISO/IEC together with AES and Camellia (Aoki et al., 2000, ISO/IEC 18033-3, 2005; Korea Information Security Agency, 1999; National Institute of Standards and Technology, 2001) [1], [4], [5] and [6]. Since SEED had been developed, there is no distinguishing cryptanalysis except a 7-round differential attack in 2002 [7]. For this, they used the six-round differential characteristics with probability 2−124 and analyzed seven-round SEED with 2126 chosen plaintexts. In this paper, we propose a new seven-round differential characteristic with probability 2−122 and analyze eight-round SEED with 2125 chosen plaintexts. The attack requires about 2122 eight-round encryptions. This is the best-known attack on a reduced version of SEED so far.  相似文献   

19.
近年来,随着混合域示波器技术的发展,示波器既要实现传统示波器的功能,又要实现频域、调制域功能,这样在数字域信号处理中需要实现实时数字下变频(DDC)功能,实时DDC技术是实现示波器向频域和调制域功能扩展的基础,可以实现示波器的增值应用,大大扩大示波器的应用领域。本文根据高速信号采样的特点,给出了实时DDC技术架构,该架构由数字正交混频、FIR1-FIR3滤波器、HB1-HB10滤波器组成,对于20GSa/s采样数据流而言,最高支持1.25GSa/s I/Q数据流输出,最低305 kSa/s I/Q数据流输出,可满足绝大多数应用场景。对数字正交混频、FIR1滤波器、FIR2滤波器、FIR3滤波器、HB滤波器进行详细设计分析,给出了实现架构,对于FIR和HB滤波器,还给出了最佳滤波器阶数及其幅频响应曲线。对于数字正交混频、FIR1-FIR3滤波器,由于其数字速率超过了FPGA正常工作时钟范围,通过多路并行处理的手段实现信号处理。最后使用矢量信号分析软件对DDC的13种I/Q速率下的EVM性能进行了评估,分别评估了载波频率1.5GHz和3GHz的EVM性能,通过评估,EVM值大部分集中在0.5%以下,可满足使用需求。  相似文献   

20.
A wireless sensor network (WSN) commonly whilst a body sensor network (BSN) must be secured with requires lower level security for public information gathering, strong authenticity to protect personal health information. In this paper, some practical problems with the message authentication codes (MACs), which were proposed in the popular security architectures for WSNs, are reconsidered. The analysis shows that the recommended MACs for WSNs, e.g., CBC- MAC (TinySec), OCB-MAC (MiniSec), and XCBC-MAC (SenSee), might not be exactly suitable for BSNs. Particularly an existential forgery attack is elaborated on XCBC-MAC. Considering the hardware limitations of BSNs, we propose a new family of tunable lightweight MAC based on the PRESENT block cipher. The first scheme, which is named TukP, is a new lightweight MAC with 64-bit output range. The second scheme, which is named TuLP-128, is a 128-bit variant which provides a higher resistance against internal collisions. Compared with the existing schemes, our lightweight MACs are both time and resource efficient on hardware-constrained devices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号