期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Gaussian Mixture Clustering and Language Adaptation for the Development of a New Language Speech Recognition System

Nikos Chatzichrisafis Vassilios Diakoloukas Vassilios Digalakis Costas Harizakis 《IEEE transactions on audio, speech, and language processing》2007,15(3):928-938

The porting of a speech recognition system to a new language is usually a time-consuming and expensive process since it requires collecting, transcribing, and processing a large amount of language-specific training sentences. This work presents techniques for improved cross-language transfer of speech recognition systems to new target languages. Such techniques are particularly useful for target languages where minimal amounts of training data are available. We describe a novel method to produce a language-independent system by combining acoustic models from a number of source languages. This intermediate language-independent acoustic model is used to bootstrap a target-language system by applying language adaptation. For our experiments, we use acoustic models of seven source languages to develop a target Greek acoustic model. We show that our technique significantly outperforms a system trained from scratch when less than 8 h of read speech is available 相似文献

2.

A Configurable Logic Based Architecture for Real-Time Continuous Speech Recognition Using Hidden Markov Models

Panagiotis Stogiannos Apostolos Dollas Vassilis Digalakis 《The Journal of VLSI Signal Processing》2000,24(2-3):223-240

An architecture is presented for real-time continuous speech recognition based on a modified hidden Markov model. The algorithm is adapted to the needs of continuous speech recognition by efficient encoding of the state space, and logarithmic encoding of the weights so that products can be computed as sums. The paper presents the algorithm and its application related modifications, the mapping of the algorithm to a special purpose architecture, and the detailed design of this architecture using configurable logic. Emphasis is given on how the attributes of the algorithm are exploited in a configurable logic based design. A concrete design example is presented with a coprocessor engine having one large FPGA, 64 Mbytes of synchronous DRAM (SDRAM), a small FPGA as a SDRAM controller, and 2 Mbytes SRAM. This engine operating at 66 MHz performs roughly nine times as fast as a high end personal computer running a fully optimized version of the same algorithm. 相似文献

3.

Fast algorithms for phone classification and recognition usingsegment-based models

Digalakis V.V. Ostendorf M. Rohlicek J.R. 《Signal Processing, IEEE Transactions on》1992,40(12):2885-2896

Methods for reducing the computation requirements of joint segmentation and recognition of phones using the stochastic segment model are presented. The approach uses a fast segment classification method that reduces computation by a factor of two to four, depending on the confidence of choosing the most probable model. A split-and-merge segmentation algorithm is proposed as an alternative to the typical dynamic programming solution of the segmentation and recognition problem, with computation savings increasing proportionally with model complexity. Although the current recognizer uses context-independent phone models, the results reported for the TIMIT database for speaker-independent joint segmentation and recognition are comparable to those of systems that use context information 相似文献

4.

Quantization of cepstral parameters for speech recognition over theWorld Wide Web

Digalakis V.V. Neumeyer L.G. Perakakis M. 《Selected Areas in Communications, IEEE Journal on》1999,17(1):82-90

We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). We compare a server-only processing model where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second 相似文献

5.

Three-dimensional linear prediction and its application to digital angiography

Vassilios V. Digalakis Vinay K. Ingle Dimitris G. Manolakis 《Multidimensional Systems and Signal Processing》1993,4(4):307-329

In this article, we apply three-dimensional (3-D) linear least-squares (LS) prediction technique to the processing of digital subtraction angiography (DSA) image sequences. The main goal of this processing is the cancellation of motion artifacts, which is a visual structured noise that appears in current DSA images. We address two important issues with this new technique: first the misregistration between the mask and the contrast image and, second, the temporal filtering of DSA image sequence. Instead of treating these two issues separately, as conventional DSA methods do, we combine them into a 3-D LS prediction problem. Based on this approach, we develop a new efficient algorithm for the solution of normal equations. The algorithm is based on a new property of Tⁿ (Toeplitz to then) matrices that we prove. In order to match the image sequence physical characteristics, we further optimize practical parameters of this algorithm. Actual patient data is used for the evaluation of this new technique. Results show a significant improvement over the existing methods. 相似文献