Music genre classification based on visual representation has been successfully explored over the last years. Recently, there has been increasing interest in attempting convolutional neural networks (CNNs) to achieve the task. However, most of the existing methods employ the mature CNN structures proposed in image recognition without any modification, which results in the learning features that are not adequate for music genre classification. Faced with the challenge of this issue, we fully exploit the low-level information from spectrograms of audio and develop a novel CNN architecture in this paper. The proposed CNN architecture takes the multi-scale time-frequency information into considerations, which transfers more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. The experiments are evaluated on the benchmark datasets including GTZAN, Ballroom, and Extended Ballroom. The experimental results show that the proposed method can achieve 93.9%, 96.7%, 97.2% classification accuracies respectively, which to the best of our knowledge, are the best results on these public datasets so far. It is notable that the trained model by our proposed network possesses tiny size, only 0.18M, which can be applied in mobile phones or other devices with limited computational resources. Codes and model will be available at https://github.com/CaifengLiu/music-genre-classification.
In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during CNN layer processing. The CoNNA architecture enables the on-the-fly selection of the CNN network that should be accelerated and also supports the acceleration of CNN networks with dynamic topology. Furthermore, by being able to directly process compressed feature and kernel maps, and skip all ineffectual computations during CNN layer processing, the CoNNA CNN accelerator is able to achieve higher CNN processing rates than some of the previously proposed solutions. The CoNNA architecture has been implemented using Xilinx ZynqUtrascale+ FPGA family and compared with seven previously proposed CNN hardware accelerators. Results of the experiments seem to indicate that the CoNNA architecture is up to 14.10, 6.05, 4.91, 2.67, 11.30, 3.08 and 3.58 times faster than previously proposed MIT's Eyeriss, NullHop, NVIDIA's Deep Learning Accelerator (NVDLA), NEURAghe, CNN_A1, fpgaConvNet, and Deephi's Aristotle CNN accelerators respectively, while using identical number of computing units and operating at the same clock frequency. 相似文献
The convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity, which involve enormous communication bandwidth and storage resources requirement. The computation requirement can be addressed effectively to achieve high throughput by highly parallel compute paradigms of current CNNs accelerators. But the energy consumption still remains high as communication can be more expensive than computation, especially for low power embedded platform. To address this problem, this paper proposes a CNNs accelerator based on a novel storage and dataflow on multi-processor system on chip (MPSoC) platform. By minimizing data access and movement and maximizing data reuse, it can achieve the energy efficient CNNs inference acceleration. The optimization strategies mainly involve four aspects. Firstly, an external memory sharing architecture adopting two-dimensional array storage mode for CPU-FPGA collaborative processing is proposed to achieve high data throughput and low bandwidth requirement for off-chip data transmission. Secondly, the minimized data access and movement on chip are realized by designing a multi-level hierarchical storage architecture. Thirdly, a cyclic data shifting method is proposed to achieve maximized data reuse based on both spatial and temporal. In addition, a bit fusion method based on the 8-bit dynamic fixed-point quantization is adopted to achieve double throughput and computational efficiency of a single DSP. The accelerator proposed in this paper is implemented on Zynq UltraScale?+?MPSoC ZCU102 evaluation board. By running the benchmark network of VGG16 and Tiny-YOLO on the accelerator, the throughput and the energy efficiency are evaluated. Compared with the current typical accelerators, the proposed accelerator can increase system throughput by up to 41x, single DSP throughput by up to 7.63x, and system energy efficiency by up to 6.3x.
This paper proposes a novel deep learning-based approach for financial chart patterns classification. Convolutional neural networks (CNNs) have made notable achievements in image recognition and computer vision applications. These networks are usually based on two-dimensional convolutional neural networks (2D CNNs). In this paper, we describe the design and implementation of one-dimensional convolutional neural networks (1D CNNs) for the classification of chart patterns from financial time series. The proposed 1D CNN model is compared against support vector machine, extreme learning machine, long short-term memory, rule-based and dynamic time warping. Experimental results on synthetic datasets reveal that the accuracy of 1D CNN is highest among all the methods evaluated. Results on real datasets also reveal that chart patterns identified by 1D CNN are also the most recognized instances when they are compared to those classified by other methods.
Nowadays, Convolutional Neural Networks (CNNs) are widely used as prediction models in different fields, with intensive use in real-time safety-critical systems. Recent studies have demonstrated that hardware faults induced by an external perturbation or aging effects, may significantly impact the CNN inference, leading to prediction failures. Therefore, ensuring the reliability of CNN platforms is crucial, especially when deployed in critical applications. A lot of effort has been made to reduce the memory and energy footprint of CNNs, paving the way to the adoption of approximate computing techniques such as quantization, reduced precision, weight sharing, and pruning. Unfortunately, approximate computing reduces the intrinsic redundancy of CNNs making them more efficient but less resilient to hardware faults. The goal of this work is twofold. First, we assess the reliability of a CNN when reduced bit widths and two different data types (floating- and fixed-point) are used to represent the network parameters (i.e., synaptic weights). Second, we intend to investigate the best compromise between data type, bit-widths reduction, and reliability. The characterization is performed through a fault injection environment built on the darknet open-source framework and targets two CNNs: LeNet-5 and YOLO. Experimental results show that fixed-point data provide the best trade-off between memory footprint reduction and CNN resilience. In particular, for LeNet-5, we achieved a 4X memory footprint reduction at the cost of a slightly reduced reliability (0.45% of critical faults) without retraining the CNN. 相似文献
Human activity recognition and deep learning are two fields that have attracted attention in recent years. The former due to its relevance in many application domains, such as ambient assisted living or health monitoring, and the latter for its recent and excellent performance achievements in different domains of application such as image and speech recognition. In this article, an extensive analysis among the most suited deep learning architectures for activity recognition is conducted to compare its performance in terms of accuracy, speed, and memory requirements. In particular, convolutional neural networks (CNN), long short-term memory networks (LSTM), bidirectional LSTM (biLSTM), gated recurrent unit networks (GRU), and deep belief networks (DBN) have been tested on a total of 10 publicly available datasets, with different sensors, sets of activities, and sampling rates. All tests have been designed under a multimodal approach to take advantage of synchronized raw sensor' signals. Results show that CNNs are efficient at capturing local temporal dependencies of activity signals, as well as at identifying correlations among sensors. Their performance in activity classification is comparable with, and in most cases better than, the performance of recurrent models. Their faster response and lower memory footprint make them the architecture of choice for wearable and IoT devices. 相似文献
Visual impairment assistance systems play a vital role in improving the standard of living for visually impaired people (VIP). With the development of deep learning technologies and assistive devices, many assistive technologies for VIP have achieved remarkable success in environmental perception and navigation. In particular, convolutional neural network (CNN)-based models have surpassed the level of human recognition and achieved a strong generalization ability. However, the large memory and computation consumption in CNNs have been one of the main barriers to deploying them into resource-limited systems for visual impairment assistance applications. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with a specific architecture design. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we introduce a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the previous CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution. 相似文献
Convolutional neural networks (CNNs) have shown tremendous progress and performance in recent years. Since emergence, CNNs have exhibited excellent performance in most of classification and segmentation tasks. Currently, the CNN family includes various architectures that dominate major vision-based recognition tasks. However, building a neural network (NN) by simply stacking convolution blocks inevitably limits its optimization ability and introduces overfitting and vanishing gradient problems. One of the key reasons for the aforementioned issues is network singularities, which have lately caused degenerating manifolds in the loss landscape. This situation leads to a slow learning process and lower performance. In this scenario, the skip connections turned out to be an essential unit of the CNN design to mitigate network singularities. The proposed idea of this research is to introduce skip connections in NN architecture to augment the information flow, mitigate singularities and improve performance. This research experimented with different levels of skip connections and proposed the placement strategy of these links for any CNN. To prove the proposed hypothesis, we designed an experimental CNN architecture, named as Shallow Wide ResNet or SRNet, as it uses wide residual network as a base network design. We have performed numerous experiments to assess the validity of the proposed idea. CIFAR-10 and CIFAR-100, two well-known datasets are used for training and testing CNNs. The final empirical results have shown a great many of promising outcomes in terms of performance, efficiency and reduction in network singularities issues.
Cellular neural network (CNN) has been acted as a high-speed parallel analog signal processor gradually. However, recently, since the decrease in the size of transistor is going to approach the utmost, the transistor-based integrated circuit technology hits a bottleneck. As a result, the advantage of very large scale integration implementation of CNN becomes hard to really present, and further development of this era faces severe challenges unavoidably. In this study, two types of memristor-based cellular neural networks have been proposed. One type uses a memristor to replace the linear resistor in a conventional CNN cell circuit. And the other places a resonant tunneling diode (RTD) in this position and uses memristive synaptic connections to structure a hybrid memristor RTD CNN model. The excellent performances of the proposed CNNs are verified by conventional means of, for instance, stability analysis and efficient applications in image processing. Since both the memristor and the resonant tunneling diode are nanoscale, the size of the network circuits can be greatly reduced, and the integration density of the system will be significantly improved. 相似文献
Convolutional neural networks (CNN) are mainly used for image recognition tasks. However, some huge models are infeasible for mobile devices because of limited computing and memory resources. In this paper, feature maps of DenseNet and CondenseNet are visualized. It could be observed that there are some feature channels in locked state and some have similar distribution property, which could be compressed further. Thus, in this work, a novel architecture — RSNet is introduced to improve the computing efficiency of CNNs. This paper proposes Relative-Squeezing (RS) bottleneck design, where the output is the weighted percentage of input channels. Besides, RSNet also contains multiple compression layers and learned group convolutions (LGCs). By eliminating superfluous feature maps, relative squeezing and compression layers only transmit the most significant features to the next layer. Less parameters are employed and much computation is saved. The proposed model is evaluated on three benchmark datasets: CIFAR-10, CIFAR-100 and ImageNet. Experiment results show that RSNet performs better with less parameters and FLOPs, compared to the state-of-the-art baseline, including CondenseNet, MobileNet and ShuffleNet. 相似文献
Deep convolutional neural networks-based methods have brought great breakthrough in image classification, which provides an end-to-end solution for handwritten Chinese character recognition (HCCR) problem through learning discriminative features automatically. Nevertheless, state-of-the-art CNNs appear to incur huge computational cost and require the storage of a large number of parameters especially in fully connected layers, which is difficult to deploy such networks into alternative hardware devices with limited computation capacity. To solve the storage problem, we propose a novel technique called weighted average pooling for reducing the parameters in fully connected layer without loss in accuracy. Besides, we implement a cascaded model in single CNN by adding mid output to complete recognition as early as possible, which reduces average inference time significantly. Experiments are performed on the ICDAR-2013 offline HCCR dataset. It is found that our proposed approach only needs 6.9 ms for classifying a character image on average and achieves the state-of-the-art accuracy of 97.1% while requires only 3.3 MB for storage. 相似文献
The binary relation inference network (BRIN) shows promise in obtaining the global optimal solution for optimization problem, which is time independent of the problem size. However, the realization of this method is dependent on the implementation platforms. We studied analog and digital FPGA implementation platforms. Analog implementation of BRIN for two different directed graph problems is studied. As transitive closure problems can transform to a special case of shortest path problems or a special case of maximum spanning tree problems, two different forms of BRIN are discussed. Their circuits using common analog integrated circuits are investigated. The BRIN solution for critical path problems is expressed and is implemented using the separated building block circuit and the combined building block circuit. As these circuits are different, the response time of these networks will be different. The advancement of field programmable gate arrays (FPGAs) in recent years, allowing millions of gates on a single chip and accompanying with high-level design tools, has allowed the implementation of very complex networks. With this exemption on manual circuit construction and availability of efficient design platform, the BRIN architecture could be built in a much more efficient way. Problems on bandwidth are removed by taking all previous external connections to the inside of the chip. By transforming BRIN to FPGA (Xilinx XC4010XL and XCV800 Virtex), we implement a synchronous network with computations in a finite number of steps. Two case studies are presented, with correct results verified from simulation implementation. Resource consumption on FPGAs is studied showing that Virtex devices are more suitable for the expansion of network in future developments. 相似文献