首页 | 官方网站   微博 | 高级检索  
     


An unsupervised approach for co-channel speech separation using Hilbert–Huang transform and Fuzzy C-Means clustering
Authors:M K Prasanna Kumar  R Kumaraswamy
Affiliation:1.BMS College of Engineering,Bangalore,India;2.Siddaganga Institute of Technology,Tumkur,India
Abstract:In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time–frequency domain using short time Fourier transform (STFT). Time–frequency ratio’s are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号