期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bottom-up broadcast neural network for music genre classification

Liu Caifeng Feng Lin Liu Guochao Wang Huibing Liu Shenglan 《Multimedia Tools and Applications》2021,80(5):7313-7331

Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.

相似文献

2.

基于ZYNQ和CNN模型的服装识别系统

熊伟黄鲁《计算机系统应用》2019,28(11):101-106

商品检索是电商行业智能化发展的一个重要的问题.本设计实现了基于ZYNQ和CNN模型的服装识别系统.利用TensorFlow训练自定义网络,定点化处理权重参数.利用ZYNQ器件的ARM+FPGA软硬件协同的特点搭建系统,使用ARM端OpenCV进行图像预处理,FPGA端CNN IP进行实时识别.ARM与FPGA之间实现了权重可重加载结构,无需修改FPGA硬件而实现在线升级.系统采用fashion-minist数据集作为网络训练样本,根据系统资源配置CNN IP的加速引擎的数量来提高卷积运算的并行性.实验表明,本系统针对电商平台下的图片能够实时准确识别和显示,准确率达92.39%.在100 MHz工作频率下,图像处理速度每帧可达到1.361 ms,功耗仅为0.53 W. 相似文献

3.

一种高效的稀疏卷积神经网络加速器的设计与实现

下载免费PDF全文

余成宇李志远毛文宇鲁华祥《智能系统学报》2020,15(2):323-333

针对卷积神经网络计算硬件化实现困难的问题,之前大部分卷积神经网络加速器的设计都集中于解决计算性能和带宽瓶颈,忽视了卷积神经网络稀疏性对加速器设计的重要意义,近来少量的能够利用稀疏性的卷积神经网络加速器设计也往往难以同时兼顾计算灵活度、并行效率和资源开销。本文首先比较了不同并行展开方式对利用稀疏性的影响,分析了利用稀疏性的不同方法,然后提出了一种能够利用激活稀疏性加速卷积神经网络计算的同时,相比于同领域其他设计,并行效率更高、额外资源开销更小的并行展开方法,最后完成了这种卷积神经网络加速器的设计并在FPGA上实现。研究结果表明:运行VGG-16网络,在ImageNet数据集下,该并行展开方法实现的稀疏卷积神经网络加速器和使用相同器件的稠密网络设计相比,卷积性能提升了108.8%,整体性能提升了164.6%,具有明显的性能优势。相似文献

4.

CoNNa–Hardware accelerator for compressed convolutional neural networks

《Microprocessors and Microsystems》2020

In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during CNN layer processing. The CoNNA architecture enables the on-the-fly selection of the CNN network that should be accelerated and also supports the acceleration of CNN networks with dynamic topology. Furthermore, by being able to directly process compressed feature and kernel maps, and skip all ineffectual computations during CNN layer processing, the CoNNA CNN accelerator is able to achieve higher CNN processing rates than some of the previously proposed solutions. The CoNNA architecture has been implemented using Xilinx ZynqUtrascale+ FPGA family and compared with seven previously proposed CNN hardware accelerators. Results of the experiments seem to indicate that the CoNNA architecture is up to 14.10, 6.05, 4.91, 2.67, 11.30, 3.08 and 3.58 times faster than previously proposed MIT's Eyeriss, NullHop, NVIDIA's Deep Learning Accelerator (NVDLA), NEURAghe, CNN_A1, fpgaConvNet, and Deephi's Aristotle CNN accelerators respectively, while using identical number of computing units and operating at the same clock frequency. 相似文献

5.

High-efficient MPSoC-based CNNs accelerator with optimized storage and dataflow

Zhang Yonghua Jiang Hongxu Liu Xiaojian Cao Haiheng Du Yu 《The Journal of supercomputing》2022,78(3):3205-3225

The convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity, which involve enormous communication bandwidth and storage resources requirement. The computation requirement can be addressed effectively to achieve high throughput by highly parallel compute paradigms of current CNNs accelerators. But the energy consumption still remains high as communication can be more expensive than computation, especially for low power embedded platform. To address this problem, this paper proposes a CNNs accelerator based on a novel storage and dataflow on multi-processor system on chip (MPSoC) platform. By minimizing data access and movement and maximizing data reuse, it can achieve the energy efficient CNNs inference acceleration. The optimization strategies mainly involve four aspects. Firstly, an external memory sharing architecture adopting two-dimensional array storage mode for CPU-FPGA collaborative processing is proposed to achieve high data throughput and low bandwidth requirement for off-chip data transmission. Secondly, the minimized data access and movement on chip are realized by designing a multi-level hierarchical storage architecture. Thirdly, a cyclic data shifting method is proposed to achieve maximized data reuse based on both spatial and temporal. In addition, a bit fusion method based on the 8-bit dynamic fixed-point quantization is adopted to achieve double throughput and computational efficiency of a single DSP. The accelerator proposed in this paper is implemented on Zynq UltraScale?+?MPSoC ZCU102 evaluation board. By running the benchmark network of VGG16 and Tiny-YOLO on the accelerator, the throughput and the energy efficiency are evaluated. Compared with the current typical accelerators, the proposed accelerator can increase system throughput by up to 41x, single DSP throughput by up to 7.63x, and system energy efficiency by up to 6.3x.

相似文献

6.

基于FPGA的递归神经网络加速器的研究进展

下载免费PDF全文

高琛张帆《网络与信息安全学报》2019,5(4):1-13

递归神经网络(RNN)近些年来被越来越多地应用在机器学习领域,尤其是在处理序列学习任务中,相比CNN等神经网络性能更为优异。但是RNN及其变体,如LSTM、GRU等全连接网络的计算及存储复杂性较高,导致其推理计算慢,很难被应用在产品中。一方面,传统的计算平台CPU不适合处理RNN的大规模矩阵运算;另一方面,硬件加速平台GPU的共享内存和全局内存使基于GPU的RNN加速器的功耗比较高。FPGA 由于其并行计算及低功耗的特性,近些年来被越来越多地用来做 RNN 加速器的硬件平台。对近些年基于FPGA的RNN加速器进行了研究,将其中用到的数据优化算法及硬件架构设计技术进行了总结介绍,并进一步提出了未来研究的方向。相似文献

7.

CNN-Grinder: From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs

《Microprocessors and Microsystems》2020

相似文献

8.

1D convolutional neural networks for chart pattern classification in financial time series

Liu Liying Si Yain-Whar 《The Journal of supercomputing》2022,78(12):14191-14214

This paper proposes a novel deep learning-based approach for financial chart patterns classification. Convolutional neural networks (CNNs) have made notable achievements in image recognition and computer vision applications. These networks are usually based on two-dimensional convolutional neural networks (2D CNNs). In this paper, we describe the design and implementation of one-dimensional convolutional neural networks (1D CNNs) for the classification of chart patterns from financial time series. The proposed 1D CNN model is compared against support vector machine, extreme learning machine, long short-term memory, rule-based and dynamic time warping. Experimental results on synthetic datasets reveal that the accuracy of 1D CNN is highest among all the methods evaluated. Results on real datasets also reveal that chart patterns identified by 1D CNN are also the most recognized instances when they are compared to those classified by other methods.

相似文献

9.

Investigating data representation for efficient and reliable Convolutional Neural Networks

《Microprocessors and Microsystems》2021

Nowadays, Convolutional Neural Networks (CNNs) are widely used as prediction models in different fields, with intensive use in real-time safety-critical systems. Recent studies have demonstrated that hardware faults induced by an external perturbation or aging effects, may significantly impact the CNN inference, leading to prediction failures. Therefore, ensuring the reliability of CNN platforms is crucial, especially when deployed in critical applications. A lot of effort has been made to reduce the memory and energy footprint of CNNs, paving the way to the adoption of approximate computing techniques such as quantization, reduced precision, weight sharing, and pruning. Unfortunately, approximate computing reduces the intrinsic redundancy of CNNs making them more efficient but less resilient to hardware faults. The goal of this work is twofold. First, we assess the reliability of a CNN when reduced bit widths and two different data types (floating- and fixed-point) are used to represent the network parameters (i.e., synaptic weights). Second, we intend to investigate the best compromise between data type, bit-widths reduction, and reliability. The characterization is performed through a fault injection environment built on the darknet open-source framework and targets two CNNs: LeNet-5 and YOLO. Experimental results show that fixed-point data provide the best trade-off between memory footprint reduction and CNN resilience. In particular, for LeNet-5, we achieved a 4X memory footprint reduction at the cost of a slightly reduced reliability (0.45% of critical faults) without retraining the CNN. 相似文献

10.

A study of deep neural networks for human activity recognition

Emilio Sansano Raúl Montoliu Óscar Belmonte Fernández 《Computational Intelligence》2020,36(3):1113-1139

Human activity recognition and deep learning are two fields that have attracted attention in recent years. The former due to its relevance in many application domains, such as ambient assisted living or health monitoring, and the latter for its recent and excellent performance achievements in different domains of application such as image and speech recognition. In this article, an extensive analysis among the most suited deep learning architectures for activity recognition is conducted to compare its performance in terms of accuracy, speed, and memory requirements. In particular, convolutional neural networks (CNN), long short-term memory networks (LSTM), bidirectional LSTM (biLSTM), gated recurrent unit networks (GRU), and deep belief networks (DBN) have been tested on a total of 10 publicly available datasets, with different sensors, sets of activities, and sampling rates. All tests have been designed under a multimodal approach to take advantage of synchronized raw sensor' signals. Results show that CNNs are efficient at capturing local temporal dependencies of activity signals, as well as at identifying correlations among sensors. Their performance in activity classification is comparable with, and in most cases better than, the performance of recurrent models. Their faster response and lower memory footprint make them the architecture of choice for wearable and IoT devices. 相似文献

11.

Compressing convolutional neural networks with cheap convolutions and online distillation

《Displays》2023

Visual impairment assistance systems play a vital role in improving the standard of living for visually impaired people (VIP). With the development of deep learning technologies and assistive devices, many assistive technologies for VIP have achieved remarkable success in environmental perception and navigation. In particular, convolutional neural network (CNN)-based models have surpassed the level of human recognition and achieved a strong generalization ability. However, the large memory and computation consumption in CNNs have been one of the main barriers to deploying them into resource-limited systems for visual impairment assistance applications. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with a specific architecture design. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we introduce a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the previous CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution. 相似文献

12.

SRNET: A Shallow Skip Connection Based Convolutional Neural Network Design for Resolving Singularities

下载免费PDF全文

Yasrab Robail 《计算机科学技术学报》2019,34(4):924-938

Convolutional neural networks (CNNs) have shown tremendous progress and performance in recent years. Since emergence, CNNs have exhibited excellent performance in most of classification and segmentation tasks. Currently, the CNN family includes various architectures that dominate major vision-based recognition tasks. However, building a neural network (NN) by simply stacking convolution blocks inevitably limits its optimization ability and introduces overfitting and vanishing gradient problems. One of the key reasons for the aforementioned issues is network singularities, which have lately caused degenerating manifolds in the loss landscape. This situation leads to a slow learning process and lower performance. In this scenario, the skip connections turned out to be an essential unit of the CNN design to mitigate network singularities. The proposed idea of this research is to introduce skip connections in NN architecture to augment the information flow, mitigate singularities and improve performance. This research experimented with different levels of skip connections and proposed the placement strategy of these links for any CNN. To prove the proposed hypothesis, we designed an experimental CNN architecture, named as Shallow Wide ResNet or SRNet, as it uses wide residual network as a base network design. We have performed numerous experiments to assess the validity of the proposed idea. CIFAR-10 and CIFAR-100, two well-known datasets are used for training and testing CNNs. The final empirical results have shown a great many of promising outcomes in terms of performance, efficiency and reduction in network singularities issues.

相似文献

13.

Hybrid memristor/RTD structure-based cellular neural networks with applications in image processing 总被引：1，自引：0，他引：1

Shukai Duan Xiaofang Hu Lidan Wang Shiyong Gao Chuandong Li 《Neural computing & applications》2014,25(2):291-296

Cellular neural network (CNN) has been acted as a high-speed parallel analog signal processor gradually. However, recently, since the decrease in the size of transistor is going to approach the utmost, the transistor-based integrated circuit technology hits a bottleneck. As a result, the advantage of very large scale integration implementation of CNN becomes hard to really present, and further development of this era faces severe challenges unavoidably. In this study, two types of memristor-based cellular neural networks have been proposed. One type uses a memristor to replace the linear resistor in a conventional CNN cell circuit. And the other places a resonant tunneling diode (RTD) in this position and uses memristive synaptic connections to structure a hybrid memristor RTD CNN model. The excellent performances of the proposed CNNs are verified by conventional means of, for instance, stability analysis and efficient applications in image processing. Since both the memristor and the resonant tunneling diode are nanoscale, the size of the network circuits can be greatly reduced, and the integration density of the system will be significantly improved. 相似文献

14.

基于ZYNQ平台的YOLOv3压缩和加速

郭文旭苏远歧刘跃虎《计算机应用》2021,41(3):669-676

高精度物体检测网络急剧增加的参数和计算量使得它们很难在车辆和无人机等端侧设备上直接部署使用。针对这一问题,从网络压缩和计算加速两方面入手,提出了一种面向残差网络的新型压缩方案来实现YOLOv3的压缩,并通过ZYNQ平台对这一压缩后的网络进行加速。首先,提出了包括网络裁剪和网络量化两方面的网络压缩算法。网络裁剪方面,给出了针对残差结构的裁剪策略来将网络剪枝分为通道剪枝和残差链剪枝两个粒度,解决了通道剪枝无法应对残差连接的局限性,进一步降低了模型的参数量;网络量化方面,实现了一种基于相对熵的模拟量化方法,以通道为单位对参数进行量化,在线统计模型的参数分布与参数量化造成的信息损失,从而辅助选择最优量化策略来减少量化过程的精度损失。然后,在ZYNQ平台上设计并改进了8比特的卷积加速模块,从而优化了片上缓存结构并结合Winograd算法实现了压缩后YOLOv3的加速。实验结果表明,所提压缩算法较YOLOv3 tiny能够进一步降低模型尺寸,但检测精度提升了7个百分点;同时ZYNQ平台上的硬件加速方法获得了比其他平台更高的能耗比,从而推进了YOLOv3以及其他残差网络在ZYNQ端侧的实际部署。相似文献

15.

Interpretable Relative Squeezing bottleneck design for compact convolutional neural networks model

《Image and vision computing》2019

Convolutional neural networks (CNN) are mainly used for image recognition tasks. However, some huge models are infeasible for mobile devices because of limited computing and memory resources. In this paper, feature maps of DenseNet and CondenseNet are visualized. It could be observed that there are some feature channels in locked state and some have similar distribution property, which could be compressed further. Thus, in this work, a novel architecture — RSNet is introduced to improve the computing efficiency of CNNs. This paper proposes Relative-Squeezing (RS) bottleneck design, where the output is the weighted percentage of input channels. Besides, RSNet also contains multiple compression layers and learned group convolutions (LGCs). By eliminating superfluous feature maps, relative squeezing and compression layers only transmit the most significant features to the next layer. Less parameters are employed and much computation is saved. The proposed model is evaluated on three benchmark datasets: CIFAR-10, CIFAR-100 and ImageNet. Experiment results show that RSNet performs better with less parameters and FLOPs, compared to the state-of-the-art baseline, including CondenseNet, MobileNet and ShuffleNet. 相似文献

16.

嵌入式设备高效卷积神经网络的电力设备检测

下载免费PDF全文

林唯贤《计算机系统应用》2019,28(5):238-243

随着大型图像集的出现以及计算机硬件尤其是GPU的快速发展，卷积神经网络（CNN）已经成为人工智能领域的一种成功算法，在各种机器学习任务中表现出色.但CNN的计算复杂度远高于传统算法，嵌入式设备上有限资源的限制成为制造高效嵌入式计算的挑战性问题.在本文中，我们提出了一种基于嵌入式设备的高效卷积神经网络用于电力设备检测，根据处理速度评估这种高效的神经网络.结果表明，该算法能够满足嵌入式设备实时视频处理的要求. 相似文献

17.

Building efficient CNN architecture for offline handwritten Chinese character recognition

Zhiyuan Li Nanjun Teng Min Jin Huaxiang Lu

《International Journal on Document Analysis and Recognition》

Deep convolutional neural networks-based methods have brought great breakthrough in image classification, which provides an end-to-end solution for handwritten Chinese character recognition (HCCR) problem through learning discriminative features automatically. Nevertheless, state-of-the-art CNNs appear to incur huge computational cost and require the storage of a large number of parameters especially in fully connected layers, which is difficult to deploy such networks into alternative hardware devices with limited computation capacity. To solve the storage problem, we propose a novel technique called weighted average pooling for reducing the parameters in fully connected layer without loss in accuracy. Besides, we implement a cascaded model in single CNN by adding mid output to complete recognition as early as possible, which reduces average inference time significantly. Experiments are performed on the ICDAR-2013 offline HCCR dataset. It is found that our proposed approach only needs 6.9 ms for classifying a character image on average and achieves the state-of-the-art accuracy of 97.1% while requires only 3.3 MB for storage. 相似文献

18.

基于ARM+FPGA平台的二值神经网络加速方法研究

孙孝辉宋庆增金光浩姜文超《计算机应用研究》2020,37(3):779-783

现有的卷积神经网络由于其结构复杂且依赖的数据集庞大,难以满足某些实际应用或者计算平台对运算性能的要求和能耗的限制。针对这些应用或计算平台,对基于ARM+FPGA平台的二值化算法进行了研究,并设计了二值神经网络,该网络减少了数据对存储单元的需求量,也降低了运算的复杂度。在ARM+FPGA平台内部实现时,通过将卷积的乘累加运算转换为XNOR逻辑运算和popcount等操作,提高了整体的运算效率,降低了对能源和资源的消耗。同时,根据二值神经网络中数据存储的特点提出了新的行处理改进算法,提高了网络的吞吐量。该实现方式在GOPS、能源和资源效率方面均优于现有的FPGA神经网络加速方法。相似文献

19.

无纺布疵点实时检测技术与系统设计

邓泽林刘行董云龙袁烨《自动化学报》2021,47(3):583-593

无纺布生产过程中产生的疵点会严重影响产品质量并限制生产效率. 提高疵点检测的自动化程度对于无纺布的生产效率和质量管控至关重要. 传统疵点检测方法难以应对纹理、疵点类型以及环境变化等问题, 限制了其应用范围. 近年来基于卷积神经网络的方法在疵点检测领域得到了广泛应用, 具有泛化性强、准确度高的特点. 但是在无纺布生产过程中, 布匹宽度大、速度快的特点会产生大量图像数据, 基于卷积神经网络的方法难以实现实时检测. 针对上述难题, 本文提出了一种基于最大稳定极值区域分析与卷积神经网络协同的疵点实时检测方法, 并设计了分布式计算处理架构应对数据流过大的问题. 在实际生产部署应用中, 本文所设计的系统与算法无需使用专用计算硬件(GPU、FPGA等), 通过8台工控机与16路工业摄像头对复卷机上布宽2.8 m、速度30 m/min的无纺布进行分布式实时在线检测, 大幅度提高无纺布生产中疵点检测的自动化程度与效率. 本文所提出的系统能够实现对0.3 mm以上疵点召回率100%, 对0.1 mm丝状疵点召回率98.8%. 相似文献

20.

Analog and digital FPGA implementation of BRIN for optimization problems.

H S Ng K P Lam 《Neural Networks, IEEE Transactions on》2003,14(5):1413-1425

The binary relation inference network (BRIN) shows promise in obtaining the global optimal solution for optimization problem, which is time independent of the problem size. However, the realization of this method is dependent on the implementation platforms. We studied analog and digital FPGA implementation platforms. Analog implementation of BRIN for two different directed graph problems is studied. As transitive closure problems can transform to a special case of shortest path problems or a special case of maximum spanning tree problems, two different forms of BRIN are discussed. Their circuits using common analog integrated circuits are investigated. The BRIN solution for critical path problems is expressed and is implemented using the separated building block circuit and the combined building block circuit. As these circuits are different, the response time of these networks will be different. The advancement of field programmable gate arrays (FPGAs) in recent years, allowing millions of gates on a single chip and accompanying with high-level design tools, has allowed the implementation of very complex networks. With this exemption on manual circuit construction and availability of efficient design platform, the BRIN architecture could be built in a much more efficient way. Problems on bandwidth are removed by taking all previous external connections to the inside of the chip. By transforming BRIN to FPGA (Xilinx XC4010XL and XCV800 Virtex), we implement a synchronous network with computations in a finite number of steps. Two case studies are presented, with correct results verified from simulation implementation. Resource consumption on FPGAs is studied showing that Virtex devices are more suitable for the expansion of network in future developments. 相似文献