期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郑继明俞佳《计算机工程与应用》2009,45(11):158-161

音频特征提取是音频分类的基础,而音频分类又是内容的音频检索的关键。综合分析了语音和音乐的区别性特征,提出一种基于小波变换和支持向量机的音频特征提取和分类的方法,用于纯语音、音乐、带背景音乐的语音以及环境音的分类,并且评估了新特征集合在SVM分类器上的分类效果。实验结果表明,提出的音频特征有效、合理,分类性能较好。相似文献

2.

音频自动分类中的特征分析和抽取 总被引：8，自引：1，他引：8

白亮老松杨陈剑赟吴玲达《小型微型计算机系统》2005,26(11):2029-2034

音频特征分析和抽取是音频自动分类的基础，本文将音频对象分为静音，噪音，纯语音，带背景音语音，音乐等5类，从帧层次和段层次上深入分析了不同类音频之间的区别性特征，包括帧层次上的MFCC，频域能量，子带能量，过零率，频谱中心等特征，在此基础上计算了段层次上的基本音频特征，包括静音比率，子带能量比均值等，提出了3个音频”流”特征-High-ZCR比率，Low-Frequency-Energy比率，频谱流量．设计并实现了一种基于支持向量机（support vector machine）的自动分类器，考察了上述特征组成的特征集合在该分类器中的分类性能．实验表明，本文提出的特征有效，分类性能良好．相似文献

3.

基于子空间方法的应力影响下变异语音分类

吕成国韩纪庆《计算机工程与应用》2007,43(1):16-18

应力影响下的变异语音是由于说话人受到重力加速度变化而产生的,与正常语音相比,变异语音频谱能量在频带范围内分布更加分散。把整个频带划分成8个子带,采用子带频谱能量的比值为特征,提出一种基于子空间方法的正常/变异语音分类方法。该方法采用CLAFIC方法设计初始向量子空间,并通过LSM算法对两类样本子空间按不同的旋转方式训练,用预分类的结果调整分类器的参数来改善分类器的性能。实验结果表明,该方法对应力影响下的变异语音与正常语音具有良好分类效果,平均分类正确率达到了95.9%。相似文献

4.

多特征联合的语音被动取证方法

林晓丹王佳斌《微型机与应用》2012,31(20):39-41

在分析篡改音频特征变化的基础上,提出了一种语音被动取证方法。采用语音的美尔倒谱域参数及其动态特征参数和小波域统计矩特征来建立模型,并选取支持向量机(SVM)作为分类器以寻找最优分类平面,实现对可疑语音信号真实性的盲取证。实验结果表明,该方法对语音片段的删除、剪接和替换等改变语音内容真实性的篡改操作能够达到较高的检测准确率。相似文献

5.

短语音及易混淆语种识别改进系统

李卓茜高镇王化刘俊南朱光旭《中文信息学报》2019,33(10):135-142

该文针对短语音(语段时长小于等于1s)和易混淆语音的语种识别进行研究。选取东方多语种识别竞赛数据集为实验数据集,对比了音素对数似然比特征、梅尔频率倒谱系数特征、深度瓶颈层特征(DBF)在短语音及易混淆语种识别中的性能,证明DBF在两种识别任务中均具有较好的性能。为提升识别准确率提出DBF-I-VECTOR语种识别改进系统,该系统分别将基线DBF-I-VECTOR系统的短语音识别等错误率最优结果从12.26%降低为10.55%,易混淆语音识别等错误率(EER)最优结果从5.53%降低为2.86%。在对比改进系统后端的余弦距离(CDS)、概率线性判别分析(PLDA)、支持向量机(SVM)、极端梯度提升(XGBoost)、随机森林(RF)分类性能时发现RF在短语音任务中分类效果最好,SVM在易混淆任务中分类效果最好。相似文献

6.

一种新颖的语言/音乐分割与分类方法

孟永辉蒋冬梅付中华谢磊《计算机工程与科学》2009,31(4)

语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。相似文献

7.

自闭症干预中无监督自编码的语音情感识别

葛磊强彦赵涓涓《软件学报》2016,27(S2):130-136

语音情感识别是人机交互中重要的研究内容,儿童自闭症干预治疗中的语音情感识别系统有助于自闭症儿童的康复,但是由于目前语音信号中的情感特征多而杂,特征提取本身就是一项具有挑战性的工作,这样不利于整个系统的识别性能.针对这一问题,提出了一种语音情感特征提取算法,利用无监督自编码网络自动学习语音信号中的情感特征,通过构建一个3层的自编码网络提取语音情感特征,把多层编码网络学习完的高层特征作为极限学习机分类器的输入进行分类,其识别率为84.14%,比传统的基于提取人为定义特征的识别方法有所提高. 相似文献

8.

融合模糊认知图用于语音情感识别

张卫张雪英孙颖《计算机工程与应用》2017,53(15):14-17

模糊认知图（Fuzzy Cognitive Map,FCM）作为一种图分析方法已在数据分类方面得到应用,为了提高其在语音情感识别中的分类精度,提出了融合FCM的方法。其中包括特征级融合和决策级融合两种方式。详细分析了这两种方式并提出将传统的模糊认知图的数值型输出转化为概率型输出,为不同特征提供了统一范围的初级识别结果。在此基础上,提出了自适应权值决策级融合方法。该方法充分考虑了分类器对不同特征的识别准确率差异。实验证明,提出的融合FCM方法相较于单一特征和单一分类器,具有更优的分类性能,同时大大降低了情感间的混淆程度。相似文献

9.

基于深度信念网络的语音情感识别的研究

黄晨晨巩微伏文龙冯东煜《计算机研究与发展》2014,(Z1)

针对语音情感识别中的特征提取的问题,提出了一种新的特征提取方式,利用深度神经网络(DNN)中的深度信念网络(DBNs)自动提取语音信号中情感特征.通过训练一个5层的深度信念网络提取语音情感特征,把连续多帧的语音并在一起,构成一个高维的特征,把深度信念网络训练完的特征作为非线性支持向量机(SVM)分类器的输入端,最终建立一个语音情感识别多分类器系统.其识别率为86.5%比传统的基于提取句子的时间构造、振幅构造、基频构造等特征的方法提高7%. 相似文献

10.

基于Adaboost算法的汉语儿向语音检测

《计算机工程》2017,(5)

儿向语音对早期儿童成长有较大影响,正确检测并充分利用儿向语音具有现实意义。为此,构建一种基于Adaboost算法的汉语儿向语音检测模型,以提高检测准确率。使用决策树作为弱分类器对提取的汉语儿向语音特征进行学习,并组成弱分类器元组,同时对该弱分类器组的分类结果进行加权,区分待测语音的类别。实验结果表明,汉语儿向语音的元音持续时长超过非儿向语音的元音持续时长;提升弱分类器的数量可提高汉语儿向语音检测正确率;分段语音时间越长,汉语儿向语音检测正确率越高;采用改进的Adaboost算法比采用v-SVM算法具有更高的准确率和精度,同时可增强系统的鲁棒性。相似文献

11.

Improvement to speech-music discrimination using sinusoidal model based features

Jalil Shirazi Shahrokh Ghaemmaghami 《Multimedia Tools and Applications》2010,50(2):415-435

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification. 相似文献

12.

Hierarchical audio content classification system using an optimal feature selection algorithm

P. Krishnamoorthy Sarvesh Kumar 《Multimedia Tools and Applications》2011,54(2):415-444

This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results. 相似文献

13.

一种面向基于内容视频检索的音频场景分割方法

朱映映明仲周景洲《小型微型计算机系统》2008,29(3):557-562

视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高. 相似文献

14.

Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news

Lei Xie Zhong-Hua Fu Wei Feng Yong Luo 《Multimedia Systems》2011,17(2):101-112

Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task. 相似文献

15.

Indexing and Retrieval of Audio: A Survey 总被引：3，自引：0，他引：3

Lu Goujun 《Multimedia Tools and Applications》2001,15(3):269-290

With more and more audio being captured and stored, there is a growing need for automatic audio indexing and retrieval techniques that can retrieve relevant audio pieces quickly on demand. This paper provides a comprehensive survey of audio indexing and retrieval techniques. We first describe main audio characteristics and features and discuss techniques for classifying audio into speech and music based on these features. Indexing and retrieval of speech and music is then described separately. Finally, significance of audio in multimedia indexing and retrieval is discussed. 相似文献

16.

基于内容的音频检索:概念和方法 总被引：38，自引：1，他引：37

李国辉李恒峰《小型微型计算机系统》2000,21(11):1173-1177

Ｆ过去对视觉媒体的检索,如图象和视频,进行了大量的研究。但是我们注意到音频也是多媒体中的一种典型媒体,是信息的一种常用载体。常规的自理是把数字音频当成非结构化流媒体。然而音频是语音的载体、包含丰富的听觉特征,并且具有结构信息。因此需要并且可以基于这些内容对音频进行存取。本文根据当前相关研究的进展,综述基于内容的音频检索方法,包括面向语音、音乐和音频分析的检索、音频分割等;分析并总结出音频内容及其检相似文献

17.

基于分形布朗运动和Ada Boosting的多类音频例子识别 总被引：2，自引：0，他引：2

吴飞庄永真潘红《计算机研究与发展》2003,40(7):941-949

提出了一种基于分形布朗运动的音频特征提取和识别方法．这种方法使用分形布朗运动模型计算出音频例子的分形维数，并作为其分形特征．针对音频分形特征符合高斯分布的特点，使用Ada Boosting算法进行特征约减．然后分别使用Ada-加权高斯分类器和支持向量机对约减特征后的音频分类，并在两类分类的基础上构造多类分类的模型．实验表明，经过特征约减后的音频分形特征在音乐和语音的分类中都优于其他音频特征．相似文献

18.

Content-based audio classification and segmentation by using support vector machines 总被引：9，自引：0，他引：9

Lie Lu Hong-Jiang Zhang Stan Z. Li 《Multimedia Systems》2003,8(6):482-492

Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM. 相似文献