首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于支持向量机的音频分类与分割   总被引:8,自引:0,他引:8  
音频分类与分割是提取音频结构和内容语义的重要手段,是基于内容的音频、视频检索和分析的基础。支持向量机(SVM)是一种有效的统计学习方法。本文提出了一种基于SVM的音频分类算法。将音频分为5类:静音、噪音、音乐、纯语音和带背景音的语音。在分类的基础上,采用3个平滑规则对分类结果进行平滑。分析了SVM分类嚣的分类性能,同时也评估了本文提出的新的音频特征在SVM分类嚣上的分类效果。实验结果显示,基于SVM的音频分类算法分类效果良好,平滑处理后的音频分割结果比较准确。  相似文献   

2.
Audio classification is an essential task in multimedia content analysis, which is a prerequisite to a variety of tasks such as segmentation, indexing and retrieval. This paper describes our study on multi-class audio classification on broadcast news, a popular multimedia repository with rich audio types. Motivated by the tonal regulations of music, we propose two pitch-density-based features, namely average pitch-density (APD) and relative tonal power density (RTPD). We use an SVM binary tree (SVM-BT) to hierarchically classify an audio clip into five classes: pure speech, music, environment sound, speech with music and speech with environment sound. Since SVM is a binary classifier, we use the SVM-BT architecture to realize coarse-to-fine multi-class classification with high accuracy and efficiency. Experiments show that the proposed one-dimensional APD and RTPD features are able to achieve comparable accuracy with popular high-dimensional features in speech/music discrimination, and the SVM-BT approach demonstrates superior performance in multi-class audio classification. With the help of the pitch-density-based features, we can achieve a high average accuracy of 94.2% in the five-class audio classification task.  相似文献   

3.
研究一种用支持向量机(SVM)进行多类音频分类的方法,其中引入增广两类分类法(AB法)设计多类分类器。该算法把音频分为四类:音乐、纯语音、带背景音的语音和典型的环境音,并分析了这几类音频的八个区别性特征,包括修正低能量成分比率(MLER)和修正基频(MPF)两个新特征以及频域总能量、子带能量、频率中心等其它六个基本特征,综合考察了不同特征集在基于SVM分类器中的分类精度。实验结果表明,提取的音频特征有效,基于SVM的多类音频分类效果良好。  相似文献   

4.
基于小波变换和支持向量机的音频分类   总被引:2,自引:0,他引:2       下载免费PDF全文
音频特征提取是音频分类的基础,而音频分类又是内容的音频检索的关键。综合分析了语音和音乐的区别性特征,提出一种基于小波变换和支持向量机的音频特征提取和分类的方法,用于纯语音、音乐、带背景音乐的语音以及环境音的分类,并且评估了新特征集合在SVM分类器上的分类效果。实验结果表明,提出的音频特征有效、合理,分类性能较好。  相似文献   

5.
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.  相似文献   

6.
视频数据中的音频流包含了丰富的语义信息.在基于内容的视频检索中,对音频信息的分析是不可分割的一部分.本文主要讨论基于内容的音频场景分割,分析各种音频特征及提取方法,并在此基础上提出一种新的音频流分割方法,根据六种音频类型(语音、音乐、静音、环境音、纯语音、音乐背景下的语音和环境音背景下的语音)的音频特征对视频数据中的音频流分割音频场景.实验证明该方法是有效的,在保证一定的分割精度的同时,准确率和查全率都得到了较大的提高.  相似文献   

7.
基于单类支持向量机的音频分类   总被引:1,自引:0,他引:1  
研究一种基于单类支持向量机的音频分类方法,能够使每一类样本都独立地获得一个决策函数,通过决策函数的最大值来判断样本所属的类。通过使用小波包变换提取语音特征向量,并融合多特征向量,将音频分为5类:纯语音、音乐、环境音、含背景音语音和静音。实验结果表明这种方法具有较好的分类精度,性能优于贝叶斯、隐马尔可夫模型和神经网络分类器。  相似文献   

8.
This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results.  相似文献   

9.
The content‐based classification and retrieval of real‐world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed‐type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible content‐based classification and retrieval system for mixed‐type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed‐type audio classes, querying by domain‐based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash‐based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG‐7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. © 2011 Wiley Periodicals, Inc.  相似文献   

10.
Automated audio segmentation and classification play important roles in multimedia content analysis. In this paper, we propose an enhanced approach, called the correlation intensive fuzzy c-means (CIFCM) algorithm, to audio segmentation and classification that is based on audio content analysis. While conventional methods work by considering the attributes of only the current frame or segment, the proposed CIFCM algorithm efficiently incorporates the influence of neighboring frames or segments in the audio stream. With this method, audio-cuts can be detected efficiently even when the signal contains audio effects such as fade-in, fade-out, and cross-fade. A number of audio features are analyzed in this paper to explore the differences between various types of audio data. The proposed CIFCM algorithm works by detecting the boundaries between different kinds of sounds and classifying them into clusters such as silence, speech, music, speech with music, and speech with noise. Our experimental results indicate that the proposed method outperforms the state-of-the-art FCM approach in terms of audio segmentation and classification.  相似文献   

11.
基于SVM模型的自然环境声音的分类   总被引:1,自引:0,他引:1  
提出了一种基于支持向量机(SVM)模型对自然环境声音进行分类的方法。首先,提取Mel频率倒谱系数(MFCCs)来分析声音信号;其次,对自然环境的声音基于MFCC特征集建立SVM模型;最后,使用交叉验证的测试方法得到基于SVM算法的分类结果。使用SVM模型对50类自然环境中的声音进行分类的正确率可达99.5704%,分类效果明显优于K最近邻(KNN)和二分嵌套整合(END)这两种算法。  相似文献   

12.
Content-based audio signal classification into broad categories such as speech, music, or speech with noise is the first step before any further processing such as speech recognition, content-based indexing, or surveillance systems. In this paper, we propose an efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm. We analyze different characteristic features of audio signals in time, frequency, and coefficient domains and select the optimal feature vector by employing a noble analytical scoring method to each feature. We utilize an FCM-based classification scheme and apply it on the extracted normalized optimal feature vector to achieve an efficient classification result. Experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance.  相似文献   

13.
The audio channel conveys rich clues for content-based multimedia indexing. Interesting audio analysis includes, besides widely known speech recognition and speaker identification problems, speech/music segmentation, speaker gender detection, special effect recognition such as gun shots or car pursuit, and so on. All these problems can be considered as an audio classification problem which needs to generate a label from low audio signal analysis. While most audio analysis techniques in the literature are problem specific, we propose in this paper a general framework for audio classification. The proposed technique uses a perceptually motivated model of the human perception of audio classes in the sense that it makes a judicious use of certain psychophysical results and relies on a neural network for classification. In order to assess the effectiveness of the proposed approach, large experiments on several audio classification problems have been carried out, including speech/music discrimination in Radio/TV programs, gender recognition on a subset of the switchboard database, highlights detection in sports videos, and musical genre recognition. The classification accuracies of the proposed technique are comparable to those obtained by problem specific techniques while offering the basis of a general approach for audio classification.
Liming ChenEmail:
  相似文献   

14.
In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines.  相似文献   

15.
为实现对腭裂高鼻音等级的自动识别,通过对语音信号小波处理和特征提取方法的综合研究,提出基于小波分解系数倒谱特征的腭裂高鼻音等级自动识别算法。目前,研究人员对腭裂语音的研究多基于MFCC、Teager能量、香农能量等特征,识别正确率偏低,且计算量过大。文中对4种等级腭裂高鼻音的1789个元音\a\语音数据提取小波分解系数倒谱特征参数,使用KNN分类器对4种不同等级的高鼻音进行自动识别,将识别结果与MFCC、LPCC、基音周期、共振峰和短时能量共5种经典声学特征的识别结果作比较,同时使用SVM分类器对不同等级的腭裂高鼻音进行自动识别,并与KNN分类器进行对比。实验结果表明,基于小波分解系数倒谱特征的识别结果优于经典声学特征,且KNN分类器的识别结果优于SVM分类器。小波分解系数倒谱特征在KNN中的识别率最高达到91.67%,在SVM中达到87.60%,经典声学特征在KNN分类器中的识别率为21.69%~84.54%,在SVM中的识别率为30.61%~78.24%。  相似文献   

16.
语音/音乐自动分类中的特征分析   总被引:16,自引:0,他引:16  
综合分析了语音和音乐的区别性特征,包括音调,亮度,谐度等感觉特征与MFCC(Mel-Frequency Cepstral Coefficients)系数等,提出一种left-right DHMM(Discrete Hidden Markov Model)的分类器,以极大似然作为判别规则,用于语音,音乐以及它们的混合声音的分类,并且考察了上述特征集合在该分类器中的分类性能,实验结果表明,文中提出的音频特征有效,合理,分类性能较好。  相似文献   

17.
《Information Fusion》2002,3(3):225-236
In this paper, we propose a multi-expert classification system (MES) for the audio classification of MPEG movies. The system has been designed according to an hybrid architecture which is made of three cascaded stages and constitutes an ensemble of different classifiers, each one implemented by means of a multi-expert architecture.Classification of the audio tracks exploits four pure classes (music, speech, silence and noise) plus three hybrid classes associated to complex patterns resulting from the overlap of different components (e.g., speech overlapped with music or noise).The soundtracks of 30 movies selected from various genres have been used for building a wide database of samples and for the successive assessment of system performance. A significant amount of experimental results obtained by the proposed MES, by other classification systems using a single classifier, and by another MES using a parallel fusion scheme, are reported in the paper together with comments and comparative analyses.In addition, the paper demonstrates the application of the knowledge arising from an analysis of intermediate classification results in order to obtain indications about the definition of the MES architecture. The results achieved by using our system are extremely encouraging when compared with those obtained by the other MES.  相似文献   

18.
提出了一种规则和隐马尔可夫模型相结合的音频分层分类算法,首先利用规则将新闻节目中的音频分为静音、语音和音乐三类,然后采用隐马尔可夫模型进一步将语音和音乐细分为男主持人语音、女主持人语音、交替报道、独白语音、现场语音和音乐六类。实验结果表明,男主持人语音、女主持人语音以及音乐的分类效果最好,查准率和查全率均可达90%以上;交替报道的分类性能最差,查准率为57.5%,查全率为79.3%;其他类别的分类性能居中,在70%~90%左右。与同类算法相比,该算法分类性能较高。  相似文献   

19.
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification and indexing has been becoming a focus in the research of audio processing and pattern recognition. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. Then the proposed method uses a Gaussian mixture model (GMM)-based classifier where the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood. Audio clip extraction, feature extraction, creation of index, and retrieval of the query clip are the major issues in automatic audio indexing and retrieval. A method for indexing the classified audio using LPCC features and k-means clustering algorithm is proposed.  相似文献   

20.
音乐类型分类主要包括两个阶段:特征提取和分类。文中在研究小波变换理论基础上,采用连续小波分析方法提取音乐特征参数。支持向量机是专门针对有限样本情况下的一种分类方法。它是建立在统计学习理论的VC维理论和结构风险最小原理基础上,根据有限的样本信息在模型的复杂性和学习能力之间寻求最佳折衷,以期获得最好的推广能力。采用指数径向基函数(脚)内核,分类正确率可达85%,比传统的混合高斯模型和K近邻分类器,分类性能分别提高了21%和23%。实验结果表明,采用小波和支持向量机方法是一种相当有效的音乐类型分类方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号