首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
由于音乐节拍的强度、快慢、持续时间等是反映音乐不同流派风格的重要语义特征,而音乐节拍多属于由打击乐器所产生的低频部分,为此利用小波变换对音乐信号进行6层分解来提取低频节拍特征;针对节拍特征差异不明显的音乐流派,提出用描述频域能量包络的MFCC声学特征与节拍特征结合,并用基于音乐流派机理分析的8阶MFCC代替常用的12阶MFCC。对8类音乐流派实验仿真结果表明,基于语义特征和声学特征结合的方法,总体分类准确率可达68.37%,同时特征维数增加对分类时间影响很小。  相似文献   

2.
为实现对腭裂高鼻音等级的自动识别,通过对语音信号小波处理和特征提取方法的综合研究,提出基于小波分解系数倒谱特征的腭裂高鼻音等级自动识别算法。目前,研究人员对腭裂语音的研究多基于MFCC、Teager能量、香农能量等特征,识别正确率偏低,且计算量过大。文中对4种等级腭裂高鼻音的1789个元音\a\语音数据提取小波分解系数倒谱特征参数,使用KNN分类器对4种不同等级的高鼻音进行自动识别,将识别结果与MFCC、LPCC、基音周期、共振峰和短时能量共5种经典声学特征的识别结果作比较,同时使用SVM分类器对不同等级的腭裂高鼻音进行自动识别,并与KNN分类器进行对比。实验结果表明,基于小波分解系数倒谱特征的识别结果优于经典声学特征,且KNN分类器的识别结果优于SVM分类器。小波分解系数倒谱特征在KNN中的识别率最高达到91.67%,在SVM中达到87.60%,经典声学特征在KNN分类器中的识别率为21.69%~84.54%,在SVM中的识别率为30.61%~78.24%。  相似文献   

3.
针对流行音乐中人声的发现问题,使用SVM分类器针对MFCC特征进行训练和分类。依据音频特征的连续性,后期对分类结果进行低通滤波。实验结果表明,该方法在帧层面上的识别率可以达到85.76%。实验中也发现不同语种的演唱者在发音上,特别是在MFCC特征上存在很大的统计差异性。实验中对歌曲分类的结果可以作为近一步实现音乐相似性度量的依据之一。  相似文献   

4.
Toward intelligent music information retrieval   总被引:1,自引:0,他引:1  
Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of intelligent music information retrieval. Huron points out that since the preeminent functions of music are social and psychological, the most useful characterization would be based on four types of information: genre, emotion, style,and similarity. This paper introduces Daubechies Wavelet Coefficient Histograms (DWCH)for music feature extraction for music information retrieval. The histograms are computed from the coefficients of the db/sub 8/ Daubechies wavelet filter applied to 3 s of music. A comparative study of sound features and classification algorithms on a dataset compiled by Tzanetakis shows that combining DWCH with timbral features (MFCC and FFT), with the use of multiclass extensions of support vector machine,achieves approximately 80% of accuracy, which is a significant improvement over the previously known result on this dataset. On another dataset the combination achieves 75% of accuracy. The paper also studies the issue of detecting emotion in music. Rating of two subjects in the three bipolar adjective pairs are used. The accuracy of around 70% was achieved in predicting emotional labeling in these adjective pairs. The paper also studies the problem of identifying groups of artists based on their lyrics and sound using a semi-supervised classification algorithm. Identification of artist groups based on the Similar Artist lists at All Music Guide is attempted. The semi-supervised learning algorithm resulted in nontrivial increases in the accuracy to more than 70%. Finally, the paper conducts a proof-of-concept experiment on similarity search using the feature set.  相似文献   

5.
为了提升深度卷积神经网络对音乐频谱流派特征的提取效果,提出一种基于频谱空间域特征注意的音乐流派分类算法模型DCNN-SSA。DCNN-SSA模型通过对不同音乐梅尔谱图的流派特征在空间域上进行有效标注,并且改变网络结构,从而在提升特征提取效果的同时确保模型的有效性,进而提升音乐流派分类的准确率。首先,将原始音频信号进行梅尔滤波,以模拟人耳的滤波操作对音乐的音强及节奏变化进行有效过滤,所生成的梅尔谱图进行切割后输入网络;然后,通过深化网络层数、改变卷积结构及增加空间注意力机制对模型在流派特征提取上进行增强;最后,通过在数据集上进行多批次的训练与验证来有效提取并学习音乐流派特征,从而得到可以对音乐流派进行有效分类的模型。在GTZAN数据集上的实验结果表明,基于空间注意的音乐流派分类算法与其他深度学习模型相比,在音乐流派分类准确率和模型收敛效果上有所提高,准确率提升了5.36个百分点~10.44个百分点。  相似文献   

6.
在音乐流派分类过程中,音乐流派局部特征与整体特征不一致时,通常采用的局部特征投票取最大的方法(MaxVote)在音频片段流派分类精度不高,而流派特征分布比较均衡时分类结果不合理。针对以上问题,该文提出基于音乐片段流派分布特征的神经网络投票机制(NNVote)和结合高层音乐节奏特征的RhythmNNVote投票方法。实验结果表明,NNVote方法在7个流派上的分类总精度达到68.9%,较MaxVote提高将近10%。  相似文献   

7.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a set of features derived from fundamental frequency (F0) estimation. Comparison between the proposed set of features and some commonly used timbral features is performed, aiming to assess the good discriminatory power of the proposed F0-based feature set. The classification scheme is composed of a classical Statistical Pattern Recognition classifier followed by a Fuzzy Rules Based System. Comparison with other well-proven classification schemes is also performed. Experimental results reveal that our speech/music discriminator is robust enough, making it suitable for a wide variety of multimedia applications.  相似文献   

8.
The genre is an abstract feature, but still, it is considered to be one of the important characteristics of music. Genre recognition forms an essential component for a large number of commercial music applications. Most of the existing music genre recognition algorithms are based on manual feature extraction techniques. These extracted features are used to develop a classifier model to identify the genre. However, in many cases, it has been observed that a set of features giving excellent accuracy fails to explain the underlying typical characteristics of music genres. It has also been observed that some of the features provide a satisfactory level of performance on a particular dataset but fail to provide similar performance on other datasets. Hence, each dataset mostly requires manual selection of appropriate acoustic features to achieve an adequate level of performance on it. In this paper, we propose a genre recognition algorithm that uses almost no handcrafted features. The convolutional recurrent neural network‐based model proposed in this study is trained on melspectrogram extracted from 3‐s duration audio clips taken from GTZAN dataset. The proposed model provides an accuracy of 85.36% on 10‐class genre classification. The same model has been trained and tested on 10 genres of MagnaTagATune dataset having 18,476 clips of 29‐s duration. The model has yielded an accuracy of 86.06%. The experimental results suggest that the proposed architecture with melspectrogram as input feature is capable of providing consistent performances across the different datasets  相似文献   

9.
Tea (Camellia sp.) and its plantation are very important on a worldwide scale as it is the second-most consumed beverage after water. Therefore, it becomes necessary to map the widely distributed tea plantations under various geographies and conditions. Remote-sensing techniques are effective tools to map and monitor the impact of tea plantation on land-use/land-cover (LULC). Remote sensing of tea plantations suffers from spectral mixing as these plantation areas are generally surrounded by similar types of green vegetation such as orchards and bushes. This problem is mainly tied to planting style, topography, and spectral characteristics of tea plantations, and the side effects are observed as low classification accuracies after the classification process. In this study, to overcome this problem, a three-step approach was proposed and implemented on a test area with high slope. As a first step, spectral and multi-scale textural features based on Gabor filters were extracted from high resolution multispectral digital aerial images. Similarly, based on the wavelength range of the sensor, a modified normalized difference vegetation index (MNDVI) was applied to distinguish the green vegetation cover from other LULCs. The second step involves the classification of multidimensional textural and spectral feature combinations using a support vector machine (SVM) algorithm. As a final step, two different techniques were applied for evaluating classification accuracy. The first one is a traditional site-specific accuracy assessment based on a confusion matrix calculating statistical metrics for different feature combinations. The overall accuracy and kappa values were calculated as 93.68% and 0.92, 93.82% and 0.92, and 97.40% and 0.97 for LULC maps produced by red, green, and blue (RGB), RGB + MNDVI, and RGB + MNDVI + Gabor features, respectively. The second accuracy assessment technique was the pattern-based accuracy assessment. The technique involves polygon-based fuzzy local matching. Three comparison maps showing local matching indices were obtained and used to compute the global matching index (g) for LULC maps of each feature set combination. The g values were g(RGB) (0.745), g(RGB+MNDVI) (0.745), and g(RGB+MNDVI+Gabor) (0.765) for comparison maps. Finally, based on accuracy assessment metrics, the study area was successfully classified and tea plantation features were extracted with high accuracy.  相似文献   

10.
鸟声识别研究中声音特征选取对识别分类的准确度有很大影响. 为了提高鸟声识别正确率, 针对传统的梅尔倒谱系数(MFCC)对鸟声高频信息表征不足. 提出了基于Fisher准则MFCC和翻转梅尔倒谱系数(IMFCC)的特征融合, 得到新的特征参数MFCC-IMFCC应用于鸟声识别, 提高对鸟声高频信息表征. 同时通过遗传算法(GA)对支持向量机(SVM)中的惩罚因子C和核参数g进行优化, 训练出GA-SVM分类模型. 实验表明, 在同一条件下, MFCC-IMFCC与MFCC相比, 识别率有一定的提高.  相似文献   

11.
基于HHT-MFCC和短时能量的慢性阻塞性肺病患者呼吸声识别   总被引:1,自引:0,他引:1  
常峥  罗萍  杨波  张晓晓 《计算机应用》2021,41(2):598-603
为了优化梅尔频率倒谱系数(MFCC)特征提取算法,提高对呼吸声信号识别的准确率,实现识别慢性阻塞性肺病(COPD)的目的,提出了基于希尔伯特黄变换(HHT)的MFCC与短时能量(Energy)融合的特征提取算法HHT-MFCC+Energy。首先,经预处理的呼吸声信号通过HHT计算出Hilbert边际谱和边际谱能量;其次,谱能量通过Mel滤波器得到特征向量,再对特征向量取对数和进行离散余弦变换得到HHT-MFCC系数;最后,将信号的短时能量与HHT-MFCC特征向量融合形成新特征,并通过支持向量机(SVM)进行信号识别。将MFCC、HHT-MFCC和HHT-MFCC+Energy三种特征提取算法结合SVM进行呼吸声信号识别,实验结果表明,所提出的特征融合算法在COPD患者和健康人呼吸声识别效果上都优于其他两种算法:当提取24维特征、选取100个训练样本时,识别率平均值能达到97.8%,分别比MFCC和HHT-MFCC高出6.9个百分点和1.4个百分点。  相似文献   

12.

In order to improve the efficiency and accuracy of collecting theater music data and recognizing genres, and solve the problem that a single feature in traditional algorithms cannot efficiently classify music genres, the technology of collecting data based on the Internet of Things (IoT) and Deep Belief Network (DBN) is firstly proposed for application of collecting music data and recognizing genres. The study firstly introduces the scheme of collecting music data based on the Internet of Things (IoT) and the theoretical basis of the recognition and classification of music genres by DBN. Second, the study focuses on the construction and improvement of the music genre recognition algorithm under DBN. In other words, the algorithm is optimized by adding Dropout and momentum to the network, and the optimal network model after training is implemented. And finally, the effectiveness of the algorithm is confirmed by experiment and research. The experimental results show that the efficiency of the optimized algorithm of identifying the music genres in the music library reaches 75.8%, which is far better than the traditional classic algorithms in the past. It is concluded that the technology of music data collection and genre recognition based on IoT and DBN has strong advantages, which will greatly contribute to reducing the workload of manual collection and identification of classification features, and improve efficiency. Meanwhile, the optimized algorithm model can also be extended to other fields.

  相似文献   

13.
近年来,通过分析脑电图(EEG)信号来实现情感识别的课题越来越被研究者所重视。为了丰富特征的表示能力,获得更高的情感识别分类准确率,尝试将语音信号特征梅尔频率倒谱系数MFCC应用于脑电信号。在对EEG信号小波变换的基础上将提取得到的MFCC特征与EEG特征相互融合,通过利用深度残差网络(ResNet18)的特性进行情感分类识别。实验结果表明,比起传统的单一利用EEG特征,添加了MFCC特征使得情感维度Arousal和Valence两者的识别准确率分别提升了6%和4%,达到了86.01%和85.46%,从而提升了情感的识别准确度。  相似文献   

14.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an adaptive network-based fuzzy inference system (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called warped LPC-based spectral centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The vector length used to describe the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). With the aim of increasing the classification accuracy percentage, the feature space is then transformed to a new feature space by LDA. The classification task is performed applying ANFIS to the features in the transformed space. To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech signals show the good discriminating power of the proposed approach.  相似文献   

15.
With an essential demand of human emotional behavior understanding and human machine interaction for the recent electronic applications, speaker emotion recognition is a key component which has attracted a great deal of attention among the researchers. Even though a handful of works are available in the literature for speaker emotion classification, the important challenges such as, distinct emotions, low quality recording, and independent affective states are still need to be addressed with good classifier and discriminative features. Accordingly, a new classifier, called fractional deep belief network (FDBN) is developed by combining deep belief network (DBN) and Fractional Calculus. This new classifier is trained with the multiple features such as tonal power ratio, spectral flux, pitch chroma and Mel frequency cepstral coefficients (MFCC) to make the emotional classes more separable through the spectral characteristics. The proposed FDBN classifier with integrated feature vectors is tested using two databases such as, Berlin database of emotional speech and real time Telugu database. The performance of the proposed FDBN and existing DBN classifiers are validated using False Acceptance Rate (FAR), False Rejection Rate (FRR) and Accuracy. The experimental results obtained by the proposed FDBN shows the accuracy of 98.39 and 95.88 % in Berlin and Telugu database.  相似文献   

16.
In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.  相似文献   

17.
运动想象脑电信号作为一种典型的非线性、非平稳信号,在传统基于单一特征提取的分类方法中难以取得理想的分类性能。针对该问题,将分数阶傅里叶变换(Fractional Fourier Transform, FrFT)引入到脑电信号特征提取过程中。首先利用FrFT对信号进行分析,在扩展特征域的同时从不同维度提取信号中的有用信息并构成特征向量,然后利用支持向量机(Support Vector Machine, SVM)分类器对所提取的特征向量进行分类,最后采用Graz数据开展实验。实验结果表明所提方法能够获得高达92.57%的正确分类结果,明显高于传统采用单一特征提取的分类方法。  相似文献   

18.
目的 高光谱分类任务中,由于波段数量较多,图像中存在包含噪声以及各类地物样本分布不均匀等问题,导致分类精度与训练效率不能平衡,在小样本上分类精度低。因此,提出一种基于级联多分类器的高光谱图像分类方法。方法 首先采用主成分分析方法将高度相关的高维特征合成无关的低维特征,以加快Gabor滤波器提取纹理特征的速度;然后使用Gabor滤波器提取图像在各个尺寸、方向上的纹理信息,每一个滤波器会生成一张特征图,在特征图中以待分类样本为中心取一个d×d的邻域,计算该邻域内数据的均值和方差来作为待分类样本的空间信息,再将空间信息和光谱信息融合,以降低光线与噪声的影响;最后将谱—空联合特征输入级联多分类器中,得到预测样本关于类别的概率分布的平均值。结果 实验采用Indian Pines、Pavia University和Salinas 3个数据集,与经典算法如支持向量机和卷积神经网络进行比较,并利用总体分类精度、平均分类精度和Kappa系数作为评价标准进行分析。本文方法总体分类精度在3个数据集上分别达到97.24%、99.57%和99.46%,相对于基于径向基神经网络(RBF)核函数的支持向量机方法提高了13.2%、4.8%和5.68%,相对于加入谱—空联合特征的RBF-SVM (radial basis function-support vector machine)方法提高了2.18%、0.36%和0.83%,相对于卷积神经网络方法提高了3.27%、3.2%和0.3%;Kappa系数分别是0.968 6、0.994 3和0.995 6,亦有提高。结论 实验结果表明,本文方法应用于高光谱图像分类具有较优的分类效果,训练效率较高,无需依赖GPU,而且在小样本上也具有较高的分类精度。  相似文献   

19.
Automatic mood detection and tracking of music audio signals   总被引:2,自引:0,他引:2  
Music mood describes the inherent emotional expression of a music clip. It is helpful in music understanding, music retrieval, and some other music-related applications. In this paper, a hierarchical framework is presented to automate the task of mood detection from acoustic music data, by following some music psychological theories in western cultures. The hierarchical framework has the advantage of emphasizing the most suitable features in different detection tasks. Three feature sets, including intensity, timbre, and rhythm are extracted to represent the characteristics of a music clip. The intensity feature set is represented by the energy in each subband, the timbre feature set is composed of the spectral shape features and spectral contrast features, and the rhythm feature set indicates three aspects that are closely related with an individual's mood response, including rhythm strength, rhythm regularity, and tempo. Furthermore, since mood is usually changeable in an entire piece of classical music, the approach to mood detection is extended to mood tracking for a music piece, by dividing the music into several independent segments, each of which contains a homogeneous emotional expression. Preliminary evaluations indicate that the proposed algorithms produce satisfactory results. On our testing database composed of 800 representative music clips, the average accuracy of mood detection achieves up to 86.3%. We can also on average recall 84.1% of the mood boundaries from nine testing music pieces.  相似文献   

20.
音乐分类研究已经持续多年,但目前检索效率并不理想. 提出了一种基于熵和支持向量机的音乐分类方法. 利用滤波器把音乐片段分解成不同的频率通道,然后通过离散傅里叶变换转换为频谱图后计算信息熵,并使用支持向量机在四个类别的音乐集上进行训练和测试. 同时,比较了三种不同的滤波器,其中Bark滤波取得了80%的识别率,实验结果表明其比使用MFCC特征分类效果要好.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号