首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
该文主要基于共振峰对六对发音相似的朝汉语单元音的分类方法进行了研究。首先,提取音频文件的前三个共振峰F1、F2、F3;其次,分析六对发音相似的朝汉语单元音的共振峰分布差异,针对不同的分类对象选择不同的共振峰频率特征参数或其组合形式作为分类特征;最后,采用信息增益方法确定分类阈值并对朝汉语单元音进行分类。实验结果表明,朝鲜语单元音和具有相似发音的汉语单元音之间存在可区分性,所采用的方法计算过程简单,获得了良好的分类效果。  相似文献   

2.
为了检验元音倒谱特征在法庭说话人识别中的性能,提出了使用元音稳定段美尔倒谱系数(Mel-frequeney eepstral coefficients,MFCC)作为识别特征的基于似然比的法庭说话人识别方法,并使用45人电话对话录音中元音/a/作为样本进行了测试.实验结果表明,该方法不仅能正确识别说话人,而且能根据当前嫌疑人样本和问题语音样本的差异,量化该语音样本作为证据的力度,为法庭提供科学合理的证据评估结果.与人工提取共振峰特征相比,自动特征提取的引入提高了工作效率,使识别系统的性能获得了大幅提升.  相似文献   

3.
将统计检验方法应用于核函数度量.以核函数、规范化核函数、中心化核函数和核距离作为样本在特征空间中的几何关系度量,使用t检验和F检验等7种统计检验方法检验特征空间中同类样本间几何关系度量值与异类样本间几何关系度量值的分布差异,以此反映特征空间中同类样本间内聚性与异类样本间分离性间的差异.在11个UCI数据集上进行的核函数选择实验表明,基于统计检验的核度量方法达到或超过了核校准与特征空间核度量标准等方法的效果,适用于核函数度量;并且发现两类数据分布差异主要体现在了方差差异上.此外,对核函数的处理(规范化或中心化)会改变特征空间,使得度量结果失真.  相似文献   

4.
基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。  相似文献   

5.
针对线性预测方法提取语音信号共振峰频率的不足,提出了一种基于共振峰增强的共振峰频率估计算法,从而在不增加运算量的情况下使共振峰频率的提取更加准确.实验结果表明,该算法在5kHz内提取前五个共振峰的性能都很好.  相似文献   

6.
杨亚楠  夏斌  赵磊  袁文浩 《计算机应用》2019,39(5):1421-1424
针对非视距(NLOS)状态鉴别需要已知信道类型的分类的问题,提出了一种基于卷积神经网络(CNN)的信道环境分类算法。首先,对超宽带(UWB)信道进行采样,构建样本集合;然后,利用样本集合训练CNN,对不同的信道场景特征进行提取;最终实现超宽带信道环境的分类。实验结果表明:所采用的分类方法的总模型准确率约为93.40%,能有效地实现信道环境的分类识别。  相似文献   

7.
随着数字录音的大量涌现和证据制度的不断完善,数字录音真实性检验成为录音资料能否作为证据使用的前提,因此,结合当前数字录音真实性检验在实际办案中的应用情况,对案件检材进行编辑、篡改的鉴别。采用几种检验方法,如录音参数检验、听觉检验,以及基于计算机软件进行声谱分析的声学检验来分析录音经过编辑篡改后留下的痕迹特征,探究各类特征产生的原因。实验结果表明,编辑特征会随不同的编辑软件发生变化,综合运用多种检验方法,发现数字录音文件中的编辑处理特征是目前数字录音真实性检验的可行性方案。  相似文献   

8.
语音同一性鉴定中共振峰的特征比对是认定同一的重要方法。司法鉴定中的语音检材大多来自于录音笔等电子设备,而网络语音的出现相对于通常的语音同一性鉴定是一种挑战。即时通信软件在传输网络语音的过程中为了保持网络传输的低带宽占用,对语音进行高比例的压缩,因此即时通信软件产生的网络语音的共振峰特征相对于原始语音会有变化。选取当前互联网社交领域常用的5款即时通信软件,通过实验分析出即时通信软件产生的网络语音与原始语音在共振峰特征上的差异。研究结果表明,不同即时通信软件在不同元音的共振峰特征上会呈现出变化差异,通过总结变化差异可以提高即时通信软件的网络语音同一性鉴定的准确率。  相似文献   

9.
Nakagami信道通过不同的衰落因子m可以仿真不同的信道衰落环境,仿真数据与实际测量值吻合度较高,在信道仿真领域得到广泛应用。然而,目前针对Nakagami信道模型的可信性研究较少,缺少科学的比对验证方法。根据典型Nakagami信道的一阶包络序列服从Nakagami分布这一信道统计特性,提出一种基于Cramer-von Mises (CvM)算法的拟合优度检验方法。使用“高斯+瑞利+直流”组合法建立Nakagami衰落信道模型,得到信道输出序列并从中提取包络序列。在此基础上,利用双样本CvM检验算法对包络序列的理论分布和实际分布进行拟合优度检验,实现对Nakagami信道模型的可信性评估。半实物仿真结果表明,与K-S检验、卡方检验和Z检验+卡方检验融合检验算法相比,CvM针对不同m下的Nakagami衰落信道均具有较好的识别性能,同时在可靠性和复杂度方面也具有优势,其对虚警概率为0.01以下的Nakagami衰落信道识别准确率达到92.6%,对样本长度为300 000以上的Nakagami衰落信道平均识别准确率达到96.4%,而当待检验信道为其他信道时,不存在误识别的情况。  相似文献   

10.
基于听觉感知的电子耳蜗共振峰提取方案   总被引:1,自引:1,他引:0       下载免费PDF全文
使用听觉感知的小波变换来提取电子耳蜗中的共振峰参数。首先用听觉感知的小波变换对原始语音信号进行分解重构,然后分别用自相关和格型法对合成语音信号和原始语音信号进行共振峰提取。实验结果表明:使用听觉感知的小波变换进行共振峰参数提取的可行性,合成语音信号能更好地表征原始语音信号的特征;同时也证实了电子耳蜗语音处理器中使用由格型法提取共振峰参数比自相关法更精确。  相似文献   

11.
This paper presents a method for the estimation and mapping of parametric models of speech resonance at formants for voice conversion. The spectral features at formants that contribute to voice characteristics are the trajectories of the frequencies, the bandwidths and intensities of the resonance at formants. The formant features are extracted from the poles of a linear prediction (LP) model of speech. The statistical distributions of formants are modelled by a two-dimensional hidden Markov model (HMM) spanning the time and frequency dimensions. Experimental results are presented which show a close match between HMM-based formant models and the histograms of formants. For voice conversion two alternative methods are explored for mapping the formants of a source speaker to those of a target speaker. The first method is based on an adaptive formant-tracking warping of the frequency response of the LP model and the second method is based on the rotation of the poles of the LP model of speech. Both methods transform all spectral parameters of the resonance at formants of the source speaker towards those of the target speaker. In addition, the issues affecting the selection of the warping ratios for the mapping functions are investigated. Experimental results of formant estimation and perceptual evaluation of voice morphing based on parametric formant models are presented.  相似文献   

12.
13.
提出了一种利用经验模态分解(Empirical Mode Decomposition,EMD)和加权Mel倒谱(Weighted Mel-Cepstrum coefficients,WMCEP)提取语音信号共振峰的算法。对语音信号进行EMD分解,找出含有共振峰的固有模态函数(Intrinsic Mode Function,IMF),并将其重构得到一个新的重构语音信号。对重构语音信号进行加权Mel倒谱分析,获得包含频谱主要成分的加权Mel倒谱系数;利用离散余弦平滑算法,从加权Mel倒谱系数获得谱包络,并从谱包络的峰值位置获得候选共振峰;根据共振峰的连续性约束条件和频率范围,从候选共振峰筛选得到共振峰的估计值。实验结果表明,该算法比单独使用WMCEP提取的共振峰误差更小,而且在信噪比小于20 dB时仍然能够准确提取出共振峰。  相似文献   

14.
Energy bands and spectral cues for Arabic vowels recognition   总被引:1,自引:0,他引:1  
The present study examines the short and long Arabic vowels (/a/, /a:/, /i/, /i:/, /u/ and /u:/) with a new approach based on three methods: formant frequencies extraction, spectral moments and energy bands. Among Arabic language characteristics compared to other languages are long vowels which can be pronounced with different duration length. The formant frequencies are the most exploited in characterizing vowels in different languages nevertheless using only formants was not very significant for vowels identification especially when production duration augments. Therefore, our approach is to broaden previous studies and present new tools in order to characterize long vowels compared to short ones.  相似文献   

15.
This paper presents a formant tracking linear prediction (LP) model for speech processing in noise. The main focus of this work is on the utilization of the correlation of the energy contours of speech, along the formant tracks, for improved formant and LP model estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of the inter-frame correlation of speech parameters across successive speech frames; the within frame correlations are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning spectral amplitude estimation stage where an initial estimate of the LP model of speech for each frame is obtained, (2) a formant classification and estimation stage using probability models of formants and Viterbi-decoders and (3) an inter-frame formant de-noising and smoothing stage where Kalman filters are used to model the formant trajectories and reduce the effect of residue noise on formants. The adverse effects of car and train noise on estimates of formant tracks and LP models are investigated. The evaluation results for the estimation of the formant tracking LP model demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and Kalman smoothing stages, results in a significant reduction in errors and distortions.  相似文献   

16.
In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by “uncontrollable modes” of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao–Blackwellized particle filter.   相似文献   

17.
Little is known about the perceptual processes of speaker identification and their relationship to the acoustic features of the speaker's voice. A study of speaker perception and identification by psychoacoustic experiments was carried out. Twenty male speakers were recorded and thirty listeners participated in the experiments. Statistical analysis of the results suggests that the prototype model is appropriate for explaining the process of speaker identification. The most important features for speaker identification were the fundamental frequency, the third and fourth formants, and the closing phase of the glottal wave. For different listeners, different sets of features were found to be significant for coding speaker identity.  相似文献   

18.
耳语音是噪声源激励,与正常音相比,其共振峰位置发生了偏移,带宽增宽。故采用传统的线性预测法提取耳语音共振峰时存在虚假峰问题。通过分析功率谱,提出了一种改进算法。根据极点功率不变的原则,利用极点交互因子修正共振峰的带宽,从而准确地提取出耳语音的共振峰。对汉语普通话单元音音素仿真实验的结果证明了该算法的有效性。  相似文献   

19.
A joint source-channel coding (JSCC) scheme for robust progressive image transmission over broadband wireless channels using orthogonal frequency division multiplexing (OFDM) systems with spatial diversity is proposed for the application environments where no feedback channel is available such as broadcasting services. Most of current research about JSCC focuses on either binary symmetric channels (BSC) or additive white Gaussian noise (AWGN) channels. To deal with fading channels in most previous methods, the fading channel is modeled as two state Gilbert-Elliott channel model and the JSCC is normally aimed at the BER of bad channel status, which is not optimal when the channel is at good status. By using diversity techniques and OFDM, the frequency selective fading effects in broadband wireless channels can be significantly decreased and we show that subchannels in OFDM systems approach Gaussian noisy channels when the diversity gain gets large; as a result, the system performance can be improved in terms of throughput and channel coding efficiency. After analyzing the channel property of OFDM systems with spatial diversity, a practical JSCC scheme for OFDM systems is proposed. Simulation results are presented for transmit diversity with different numbers of antennas and different multipath delay and Doppler spread. It is observed from simulations that the performance can be improved more than 4 dB in terms of peak signal-to-noise ratio (PSNR) of the received image Lena and the performance is not very sensitive to different multipath spread and Doppler frequency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号