首页 | 官方网站   微博 | 高级检索  
     


Source cell-phone recognition from recorded speech using non-speech segments
Affiliation:1. Department of Electrical-Electronic Engineering, Bursa Technical University, 16190, Bursa, Turkey;2. University of Eastern Finland, FI-80101, Joensuu, Finland;1. Department of Physics, Sogang University, Seoul 121-742, South Korea;2. Center for Water Resource Cycler Research, Korea Institute of Science and Technology, Seoul 136-791, South Korea;3. Department of Energy and Environmental Engineering, Korea University of Science and Technology (UST), Daejeon 305-806, South Korea;1. Audio & Speech Processing Lab, School of Computer Engineering, Iran University of Science & Technology, Tehran, Iran;2. Computer Engineering Department, Faculty of Engineering, Arak University, Arak, Iran;3. Electrical & Computer Engineering Department, K.N. Toosi University of Technology, Tehran, Iran;1. College of Physics Science & Information Engineering, Hebei Normal University, Shijiazhuang 050024, China;2. Key Laboratory of Advanced Films of Hebei Province, Shijiazhuang 050024, China;1. RT-RK d.o.o., Novi Sad, Serbia;2. Faculty of Engineering, University of Novi Sad, Novi Sad, Serbia;3. University of Rochester, Rochester, NY, USA
Abstract:In a recent study, we have introduced the problem of identifying cell-phones using recorded speech and shown that speech signals convey information about the source device, making it possible to identify the source with some accuracy. In this paper, we consider recognizing source cell-phone microphones using non-speech segments of recorded speech. Taking an information-theoretic approach, we use Gaussian Mixture Model (GMM) trained with maximum mutual information (MMI) to represent device-specific features. Experimental results using Mel-frequency and linear frequency cepstral coefficients (MFCC and LFCC) show that features extracted from the non-speech segments of speech contain higher mutual information and yield higher recognition rates than those from speech portions or the whole utterance. Identification rate improves from 96.42% to 98.39% and equal error rate (EER) reduces from 1.20% to 0.47% when non-speech parts are used to extract features. Recognition results are provided with classical GMM trained both with maximum likelihood (ML) and maximum mutual information (MMI) criteria, as well as support vector machines (SVMs). Identification under additive noise case is also considered and it is shown that identification rates reduces dramatically in case of additive noise.
Keywords:Source cell-phone recognition  Mel-frequency cepstrum coefficients  Mutual information  Source microphone identification  Gaussian mixture model
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号