首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
端点检测是语音识别系统的一个重要组成,尤其是在噪声环境中,其准确性对语音识别系统性能有直接影响。提出了一种基于小波子带倒谱系数(SBC)的语音信号端点检测方法,利用小波变换对频带进行尺度划分,采用小波子带倒谱能量检测语音端点。通过与MFCC的仿真对比以及大量实验分析,小波子带倒谱特征在语音端点检测中具有更好的识别性能。  相似文献   

2.
提出一种对含噪语音进行基频检测的新方法。先对含噪语音进行小波去噪,然后再经过预处理后,采用归一化的AMDF算法对语音进行基频提取,后期对基频信号采用搜索试探方法进行平滑处理,通过实验表明,该方法比传统方法有更好的鲁棒性,尤其在低信噪比的情况下。  相似文献   

3.
4.
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.  相似文献   

5.
Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases.  相似文献   

6.
In this paper, we consider the robust interpretation of Metric Temporal Logic (MTL) formulas over signals that take values in metric spaces. For such signals, which are generated by systems whose states are equipped with non-trivial metrics, for example continuous or hybrid, robustness is not only natural, but also a critical measure of system performance. Thus, we propose multi-valued semantics for MTL formulas, which capture not only the usual Boolean satisfiability of the formula, but also topological information regarding the distance, εε, from unsatisfiability. We prove that any other signal that remains εε-close to the initial one also satisfies the same MTL specification under the usual Boolean semantics. Finally, our framework is applied to the problem of testing formulas of two fragments of MTL, namely Metric Interval Temporal Logic (MITL) and closed Metric Temporal Logic (clMTL), over continuous-time signals using only discrete-time analysis. The motivating idea behind our approach is that if the continuous-time signal fulfills certain conditions and the discrete-time signal robustly satisfies the temporal logic specification, then the corresponding continuous-time signal should also satisfy the same temporal logic specification.  相似文献   

7.
针对含白噪声语音信号采用压缩感知时重构效果差的问题,提出了一种对CS投影矩阵为行阶梯矩阵下的观测序列进行小波阈消噪的方法。针对该方法在噪声较小时,由于对某些帧的观测序列的噪声标准差估计误差过大,从而导致的重构性能衰减严重的问题,提出了一阶平滑的噪声估计的改进方案,改善了高信噪比时的重构效果。仿真显示,该方法通过压缩感知降低采样率的同时,实现了语音增强,并且性能也优于传统语音消噪方法。  相似文献   

8.
传统的语音评价算法,如SNR,存在语音的可懂度相关性不高的问题.有研究表明,语音的不同部分对可懂度的贡献不同,语音的浊音起始段对可懂度的影响较大.提出一种可懂度相关性相对较高的语音评价算法.在计算分段SNR之前,对语音段进行选择,选出起始段.所提出方法的可懂度计算结果与主观得分进行比较,实验结果表明,结合语音起始段(speech onset)检测算法,能够将可懂度与主观评价的相关值分别提高0.11(辅音)和0.06(句子),这也从一个侧面验证了语音的起始段对可懂度有较大影响这一研究结论.  相似文献   

9.
10.
带噪语音端点检测方法研究   总被引:2,自引:0,他引:2  
朴春俊  马静霞  徐鹏 《计算机应用》2006,26(11):2685-2686
影响语音识别性能的一个关键因素是端点检测的准确性。实际应用中信噪比较低,使得某些高信噪比下性能好的检测算法不能有效工作,影响系统的识别率。提出了一种基于时频方差和的语音端点检测算法。实验证明该算法能够在低信噪比的情况下,准确地检测出语音信号。通过对三种不同的端点检测算法的比较,发现基于时频方差和的端点检测算法的端点检测的准确率较高。  相似文献   

11.
Based on the log-normal assumption, parallel model combination (PMC) provides an effective method to adapt the cepstral means and variances of speech models for noisy speech recognition. In addition, the log-add method has been derived to adapt the mean by ignoring the cepstral variance during the process of PMC. This method is efficient for speech recognition in a high signal-to-noise ratio (SNR) environment. In this paper, a new interpretation of the log-add method is proposed. This leads to a modified scheme for performing the adaptation procedure in PMC. This modified method is shown to be efficient in improving recognition accuracy in low SNR. Based on this modified PMC method, we derive a direct adaptation procedure for the variance of speech models in the cepstral domain. The proposed method is a fast algorithm because the computation for the transformation of the covariance matrix is no longer required. Three recognition tasks are conducted to evaluate the proposed method. Experimental results show that the proposed technique not only requires lower computational cost but it also outperforms the original PMC technique in noisy environments.  相似文献   

12.
Most speech enhancement algorithms are based on the assumption that speech and noise are both Gaussian in the discrete cosine transform (DCT) domain. For further enhancement of noisy speech in the DCT domain, we consider multiple statistical distributions (i.e., Gaussian, Laplacian and Gamma) as a set of candidates to model the noise and speech. We first use the goodness-of-fit (GOF) test in order to measure how far the assumed model deviate from the actual distribution for each DCT component of noisy speech. Our evaluations illustrate that the best candidate is assigned to each frequency bin depending on the Signal-to-Noise-Ratio (SNR) and the Power Spectral Flatness Measure (PSFM). In particular, since the PSFM exhibits a strong relation with the best statistical fit we employ a simple recursive estimation of the PSFM in the model selection. The proposed speech enhancement algorithm employs a soft estimate of the speech absence probability (SAP) separately for each frequency bin according to the selected distribution. Both objective and subjective tests are performed for the evaluation of the proposed algorithms on a large speech database, for various SNR values and types of background noise. Our evaluations show that the proposed soft decision scheme based on multiple statistical modeling or the PSFM provides further speech quality enhancement compared with recent methods through a number of subjective and objective tests.  相似文献   

13.
In this paper, a modification of the statistical method for detecting noisy signals is studied; this modification gives an increased signal-to-noise ratio. Corresponding computational formulas are obtained.  相似文献   

14.
A predictor-based controller for time-varying delay systems is presented in this paper and its robustness properties for different uncertainties are analyzed. First, a time-varying delay dependent stability condition is expressed in terms of LMIs. Then, uncertainties in the knowledge of all plant-model parameters are considered and the resulting closed-loop system is shown to be robust with respect to these uncertainties. A significant improvement with respect to the same control strategy without predictor is achieved. The scheme is applicable to open-loop unstable plants and it has been tested in a real-time application to control the roll angle of a quad-rotor helicopter prototype. The experimental results show good performance and robustness of the proposed scheme even in the presence of long delay uncertainties.  相似文献   

15.
Blind source extraction (BSE) is widely used to solve signal mixture problems where there are only a few desired signals. To improve signal extraction performance and expand its application, we develop an adaptive BSE algorithm with an additive noise model. We first present an improved normalized kurtosis as an objective function, which caters for the effect of noise. By combining the objective function and Lagrange multiplier method, we further propose a robust algorithm that can extract the desired signal as the first output signal. Simulations on both synthetic and real biomedical signals demonstrate that such combination improves the extraction performance and has better robustness to the estimation error of normalized kurtosis value in the presence of noise.  相似文献   

16.
17.
This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion.While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate.  相似文献   

18.
This paper introduces a speech encryption approach, which is based on permutation of speech segments using chaotic Baker map and substitution using masks in both time and transform domains. Two parameters are extracted from the main key used in the generation of mask. Either the Discrete Cosine Transform (DCT) or the Discrete Sine Transform (DST) can be used in the proposed cryptosystem to remove the residual intelligibility resulting from permutation and masking in time domain. Substitution with Masks is used in this cryptosystem to fill the silent periods within speech conversation and destroy format and pitch information. Permutation with chaotic Baker map is used in to maximize the benefits of the permutation process in encryption by using large-size blocks to allow more audio segments to be permutated. The proposed cryptosystem has a low complexity, small delay, and high degree of security. Simulation results prove that the proposed cryptosystem is robust to the presence of noise.  相似文献   

19.
20.
提出了一种基于EVRC的端点检测方法。在背景噪声变化的环境下,该方法将语音映射到一个基于心理声学模型的语音矩阵中,通过跟踪噪声,得到的语音矩阵和参数能够适应不同的背景噪声环境。结合汉语语音的特点,使用该方法无需改变门限,即可准确的检测出语音的端点。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号