首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
针对复调音乐中不同声源的相互干扰导致的同一声源音高序列不连续,利用音高显著性的连续性和高次谐波的稳定性,提出基于音高静态似然性函数和音高显著性动态似然函数的创建音高轮廓方法;在提取旋律音高轮廓之前,为了利用不同声源音色的不一致性,提出计算音高轮廓的梅尔频率倒谱系数作为音色特征以及从音高轮廓的各次谐波幅度中计算音色特征。改进算法在RECHSET音乐数据集上进行仿真实验,结果表明达到了62.04%的音高估计精度和55.08%的总精度。  相似文献   

3.
Vibrato is a slightly tremulous effect imparted to vocal or instrumental tone for added warmth and expressiveness through slight variation in pitch. It corresponds to a periodic fluctuation of the fundamental frequency. It is common for a singer to develop a vibrato function to personalize his/her singing style. In this paper, we explore the acoustic features that reflect vibrato information in order to identify singers of popular music. We start with an enhanced vocal detection method that allows us to select vocal segments with high confidence. From the selected vocal segments, the cepstral coefficients which reflect the vibrato characteristics are computed. These coefficients are derived using bandpass filters, such as parabolic and cascaded bandpass filters, spread according to the octave frequency scale. The strategy of our classifier formulation is to utilize the high level musical knowledge of song structure in singer modeling. Singer identification is validated on a database containing 84 popular songs from commercially available CD recordings from 12 singers. We achieve an average error rate of 16.2% in segment level identification  相似文献   

4.
5.
6.
In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus.  相似文献   

7.
8.
9.
Timbre distance and similarity are expressions of the phenomenon that some music appears similar while other songs sound very different to us. The notion of genre is often used to categorize music, but songs from a single genre do not necessarily sound similar and vice versa. In this work, we analyze and compare a large amount of different audio features and psychoacoustic variants thereof for the purpose of modeling timbre distance. The sound of polyphonic music is commonly described by extracting audio features on short time windows during which the sound is assumed to be stationary. The resulting down sampled time series are aggregated to form a high-level feature vector describing the music. We generated high-level features by systematically applying static and temporal statistics for aggregation. The temporal structure of features in particular has previously been largely neglected. A novel supervised feature selection method is applied to the huge set of possible features. The distances of the selected feature correspond to timbre differences in music. The features show few redundancies and have high potential for explaining possible clusters. They outperform seven other previously proposed feature sets on several datasets with respect to the separation of the known groups of timbrally different music.  相似文献   

10.
计算机乐器的梯形波音色模型分析   总被引:1,自引:1,他引:0       下载免费PDF全文
为了描述计算机乐器的音色,构建软件乐器,达到通过调节参数来改变计算机乐器音色的目的,建立了基于梯形波的计算机乐器音色模型。该模型包括振动子模型和振幅包络子模型。保持振幅包络参数取值不变。定义调节音色的参数,分别为时间系数、非对称度、梯形度和三角度。在VC++环境下,利用一维离散余弦变换获取该模型在不同参数下的频谱。通过分析音色参数对频谱的影响,找到这些参数对乐器音色的作用规律。分析表明,该模型的频谱中基音总是最强,等腰梯形时,没有偶次谐波;而且在非对称度偏离1越远,梯形度越大或者三角度偏离0.25越远时,频谱的高频成分就越多,音色越嘹亮;反之音色就越沉闷。对计算机乐器音色的合成和算法作曲具有一定的实用价值和基础意义。  相似文献   

11.
Li  Juan  Luo  Jing  Ding  Jianhang  Zhao  Xi  Yang  Xinyu 《Multimedia Tools and Applications》2019,78(9):11563-11584

Music regional classification, which is an important branch of music automatic classification, aims at classifying folk songs according to different regional style. Chinese folk songs have developed various regional musical styles in the process of its evolution. Regional classification of Chinese folk songs can promote the development of music recommendation systems which recommending proper style of music to users and improve the efficiency of the music retrieval system. However, the accuracy of existing music regional classification systems is not high enough, because most methods do not consider temporal characteristics of music for both features extraction and classification. In this paper, we proposed an approach based on conditional random field (CRF) which can fully take advantage of the temporal characteristics of musical audio features for music regional classification. Considering the continuity, high dimensionality and large size of the audio feature data, we employed two ways to calculate the label sequence of musical audio features in CRF, which are Gaussian Mixture Model (GMM) and Restricted Boltzmann Machine (RBM). The experimental results demonstrated that the proposed method based on CRF-RBM outperforms other existing music regional classifiers with the best accuracy of 84.71% on Chinese folk songs datasets. Besides, when the proposed methods were applied to the Greek folk songs dataset, the CRF-RBM model also performs the best.

  相似文献   

12.
This paper addresses the problem of parameterization for speech/music discrimination. The current successful parameterization based on cepstral coefficients uses the Fourier transformation (FT), which is well adapted for stationary signals. In order to take into account the non-stationarity of music/speech signals, this work proposes to study wavelet-based signal decomposition instead of FT. Three wavelet families and several numbers of vanishing moments have been evaluated. Different types of energy, calculated for each frequency band obtained from wavelet decomposition, are studied. Static, dynamic and long-term parameters were evaluated. The proposed parameterization are integrated into two class/non-class classifiers: one for speech/non-speech, one for music/non-music. Different experiments on realistic corpora, including different styles of speech and music (Broadcast News, Entertainment, Scheirer), illustrate the performance of the proposed parameterization, especially for music/non-music discrimination. Our parameterization yielded a significant reduction of the error rate. More than 30% relative improvement was obtained for the envisaged tasks compared to MFCC parameterization.  相似文献   

13.
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification has been becoming a focus in the research of audio processing and pattern recognition. Automatic audio classification is very useful to audio indexing, content-based audio retrieval and on-line audio distribution, but it is a challenge to extract the most common and salient themes from unstructured raw audio data. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. Support vector machines are applied to classify audio into their respective classes by learning from training data. Then the proposed method extends the application of neural network (RBFNN) for the classification of audio. RBFNN enables nonlinear transformation followed by linear transformation to achieve a higher dimension in the hidden space. The experiments on different genres of the various categories illustrate the results of classification are significant and effective.  相似文献   

14.
In this paper, we present a new method for analysis of musical structure that captures local prediction and global repetition properties of audio signals in one information processing framework. The method is motivated by a recent work in music perception where machine features were shown to correspond to human judgments of familiarity and emotional force when listening to music. Using a notion of information rate in a model-based framework, we develop a measure of mutual information between past and present in a time signal and show that it consist of two factors - prediction property related to data statistics within an individual block of signal features, and repetition property based on differences in model likelihood across blocks. The first factor, when applied to spectral representation of audio signals, is known as spectral anticipation, and the second factor is known as recurrence analysis. We present algorithms for estimation of these measures and create a visualization that displays their temporal structure in musical recordings. Considering these features as a measure of the amount of information processing that a listening system performs on a signal, information rate is used to detect interest points in music. Several musical works with different performances are analyzed in this paper, and their structure and interest points are displayed and discussed. Extensions of this approach towards a general framework of characterizing machine listening experience are suggested.  相似文献   

15.
Musical expressivity can be defined as the deviation from a musical standard when a score is performed by a musician. This deviation is made in terms of intrinsic note attributes like pitch, timbre, timing and dynamics. The advances in computational power capabilities and digital sound synthesis have allowed real-time control of synthesized sounds. Expressive control becomes then an area of great interest in the sound and music computing field. Musical expressivity can be approached from different perspectives. One approach is the musicological analysis of music and the study of the different stylistic schools. This approach provides a valuable understanding about musical expressivity. Another perspective is the computational modelling of music performance by means of automatic analysis of recordings. It is known that music performance is a complex activity that involves complementary aspects from other disciplines such as psychology and acoustics. It requires creativity and eventually, some manual abilities, being a hard task even for humans. Therefore, using machines appears as a very interesting and fascinating issue. In this paper, we present an overall view of the works many researchers have done so far in the field of expressive music performance, with special attention to the computational approach.  相似文献   

16.
提出了一种基于双全正弦的计算机乐器音色模型,以达到通过调节参数来改变计算机乐器音色的目的.该模型包括振动子模型和振幅包络子模型.保持了振幅包络参数取值不变,定义了调节音色的参数,分别为振幅系数、周期系数和方差系数.在VC++环境下,利用一维离散余弦变换获取该模型在不同参数下的频谱.通过分析音色参数对频谱的影响,找到这些参数对乐器音色的作用规律.实验表明,在振幅系数和周期系数都相等时,可将音高整体提高一个8度.在振幅系数和周期系数其中之一不等时,方差的值越大,音色的频谱就越丰富,乐器音色就越响亮;反之音色就越暗淡.对计算机乐器音色的合成和音乐的计算机生成具有一定的科学意义和实用价值.该模型简单,便于推广.  相似文献   

17.
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification and indexing has been becoming a focus in the research of audio processing and pattern recognition. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. Then the proposed method uses a Gaussian mixture model (GMM)-based classifier where the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood. Audio clip extraction, feature extraction, creation of index, and retrieval of the query clip are the major issues in automatic audio indexing and retrieval. A method for indexing the classified audio using LPCC features and k-means clustering algorithm is proposed.  相似文献   

18.
语音/音乐区分是音频高效编码、音频检索、自动语音识别等音频处理和分析的重要步骤。本文提出一种新颖的语音/音乐分割与分类方法,首先根据相邻帧间的均方能量差异检测音频的变化点,实现分割;然后对音频段提取低带能量方差比、倒谱能量调制、熵调制等八维特征,用人工神经网络做分类。实验结果显示,本文算法和特征具有很高的分割准确率和分类正确率。  相似文献   

19.
Automatic mood detection and tracking of music audio signals   总被引:2,自引:0,他引:2  
Music mood describes the inherent emotional expression of a music clip. It is helpful in music understanding, music retrieval, and some other music-related applications. In this paper, a hierarchical framework is presented to automate the task of mood detection from acoustic music data, by following some music psychological theories in western cultures. The hierarchical framework has the advantage of emphasizing the most suitable features in different detection tasks. Three feature sets, including intensity, timbre, and rhythm are extracted to represent the characteristics of a music clip. The intensity feature set is represented by the energy in each subband, the timbre feature set is composed of the spectral shape features and spectral contrast features, and the rhythm feature set indicates three aspects that are closely related with an individual's mood response, including rhythm strength, rhythm regularity, and tempo. Furthermore, since mood is usually changeable in an entire piece of classical music, the approach to mood detection is extended to mood tracking for a music piece, by dividing the music into several independent segments, each of which contains a homogeneous emotional expression. Preliminary evaluations indicate that the proposed algorithms produce satisfactory results. On our testing database composed of 800 representative music clips, the average accuracy of mood detection achieves up to 86.3%. We can also on average recall 84.1% of the mood boundaries from nine testing music pieces.  相似文献   

20.
In this paper, Autoassociative Neural Network (AANN) models are explored for segmentation and indexing the films (movies) using audio features. A two-stage method is proposed for segmenting the film into sequence of scenes, and then indexing them appropriately. In the first stage, music and speech plus music segments of the film are separated, and music segments are labelled as title and fighting scenes based on their position. At the second stage, speech plus music segments are classified into normal, emotional, comedy and song scenes. In this work, Mel frequency cepstral coefficients (MFCCs), zero crossing rate and intensity are used as audio features for segmentation and indexing the films. The proposed segmentation and indexing method is evaluated on manual segmented Hindi films. From the evaluation results, it is observed that title, fighting and song scenes are segmented and indexed without any errors, and most of the errors are observed in discriminating the comedy and normal scenes. Performance of the proposed AANN models used for segmentation and indexing of the films, is also compared with hidden Markov models, Gaussian mixture models and support vector machines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号