首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexicons, two of the most important written language resources for the development of ASR (automatic speech recognition) and TTS (text-to-speech) systems. The presented architecture is modular and is particularly suitable for the development of written language resources for inflectional languages. In this paper an implementation is presented for the Slovenian language. The integrated graphic user interface focuses on the morphological and phonetic aspects of language and allows experts to produce good performances during analysis. In multilingual TTS systems, many extensive external written language resources are used, especially in the text processing part. It is very important, therefore, that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into the system, without modifying the common algorithms developed for multiple languages. In this regard the use of large external language resources (e.g., morphology and phonetic lexicons) represent an important problem because of the required space and slow look-up time. This paper presents a method and its results for compiling large lexicons, using examples for compiling German phonetic and morphology lexicons (CISLEX), and Slovenian phonetic (SIflex) and morphology (SImlex) lexicons, into corresponding finite-state transducers (FSTs). The German lexicons consisted of about 300,000 words, SIflex consisted of about 60,000 and SImlex of about 600,000 words (where 40,000 words were used for representation using finite-state transducers). Representation of large lexicons using finite-state transducers is mainly motivated by considerations of space and time efficiency. A great reduction in size and optimal access time was achieved for all lexicons. The starting size for the German phonetic lexicon was 12.53 MB and 18.49 MB for the morphology lexicon. The starting size for the Slovenian phonetic lexicon was 1.8 MB and 1.4 MB for the morphology lexicon. The final size of the corresponding FSTs was 2.78 MB for the German phonetic lexicon, 6.33 MB for the German morphology lexicon, 253 KB for SIflex and 662 KB for the SImlex lexicon. The achieved look-up time is optimal, since it only depends on the length of the input word and not on the size of the lexicon. Integration of lexicons for new languages into the multilingual TTS system is easy when using such representations and does not require any changes in the algorithms used for such lexicons.  相似文献   

2.
3.
Coding techniques for handling failures in large disk arrays   总被引:9,自引:0,他引:9  
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.This paper is a revised and expanded version of material that appeared at the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), Boston, MA, March 1989. The work here was supported in part by the National Science Foundation under Grant Numbers MIP-8715235 and CCR-8411954, as well as an AT&T Bell Labs GRPW grant, a Siemens Corporation grant, and an IBM graduate fellowship.  相似文献   

4.
In order to determine priorities for the improvement of timing in synthetic speech this study looks at the role of segmental duration prediction and the role of phonological symbolic representation in the perceptual quality of a text-to-speech system. In perception experiments using German speech synthesis, two standard duration models (Klatt rules and CART) were tested. The input to these models consisted of a symbolic representation which was either derived from a database or a text-to-speech system. Results of the perception experiments show that different duration models can only be distinguished when the symbolic representation is appropriate. Considering the relative importance of the symbolic representation, post-lexical segmental rules were investigated with the outcome that listeners differ in their preferences regarding the degree of segmental reduction. As a conclusion, before fine-tuning the duration prediction, it is important to derive an appropriate phonological symbolic representation in order to improve timing in synthetic speech.  相似文献   

5.
6.
In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.  相似文献   

7.
In this paper we investigate two purely syntactical notions ofcircularity, which we call ``self-application' and ``self-inclusion.' Alanguage containing self-application allows linguistic items to beapplied to themselves. In a language allowing for self-inclusion thereare expressions which include themselves as a proper part. We introduceaxiomatic systems of syntax which include identity criteria andexistence axioms for such expressions. The consistency of these axiomsystems will be shown by providing a variety of different models –these models being our circular languages. Finally we will show what apossible semantics for these circular languages could look like.  相似文献   

8.
New Products     
《Computer》1980,13(8):80-83
HCR/Basic, an implementation of Basic which conforms completely to ANSI Standard X3.60-1978, allows PDP-11 users who run the Bell Labs Unix timesharing system to run or develop programs written in standard Basic.  相似文献   

9.
The quality of text-to-speech systems can be effectively assessed only on the basis of reliable and valid listening tests to assess overall system performance. A mean opinion scale (MOS) has been the recommended measure of synthesized speech quality [ITU-T Recommendation P.85, 1994. Telephone transmission quality subjective opinion tests. A method for subjective performance assessment of the quality of speech voice output devices]. We assessed this MOS scale and developed and tested a modified measure of speech quality. This modified measure has new items specific to text-to-speech systems. Our research was motivated by the lack of clear evidence of the conceptual content of as well as the psychometric properties of the MOS scale. We present conceptual arguments and empirical evidence for the reliability and validity of a modified scale. Moreover, we employ state of the art psychometric techniques such as confirmatory factor analysis to provide strong tests of psychometric properties. This modified scale is better suited to appraise synthesis systems since it includes items that are specific to the artifacts found in synthesized speech. We believe that the speech synthesis research communities will find this modified scale a better fit for listening tests to assess synthesized speech.  相似文献   

10.
Speech translation is a technology that helps people communicate across different languages. The most commonly used speech translation model is composed of automatic speech recognition, machine translation and text-to-speech synthesis components, which share information only at the text level. However, spoken communication is different from written communication in that it uses rich acoustic cues such as prosody in order to transmit more information through non-verbal channels. This paper is concerned with speech-to-speech translation that is sensitive to this paralinguistic information. Our long-term goal is to make a system that allows users to speak a foreign language with the same expressiveness as if they were speaking in their own language. Our method works by reconstructing input acoustic features in the target language. From the many different possible paralinguistic features to handle, in this paper we choose duration and power as a first step, proposing a method that can translate these features from input speech to the output speech in continuous space. This is done in a simple and language-independent fashion by training an end-to-end model that maps source-language duration and power information into the target language. Two approaches are investigated: linear regression and neural network models. We evaluate the proposed methods and show that paralinguistic information in the input speech of the source language can be reflected in the output speech of the target language.  相似文献   

11.
This article relates highlights from the digital computer development activities at Bell Telephone Laboratories for roughly the period 1937-1958. The history begins with a researcher using relays to build a binary adder on his home kitchen table, continues with relay computers designed for military use, and culminates with computers developed after Bell Labs invented the transistor  相似文献   

12.
With the rapid proliferation of video cameras in public places, the ability to identify and track people and other objects creates tremendous opportunities for business and security applications. This paper presents the Multiple Camera Indoor Surveillance project which is devoted to using multiple cameras, agent-based technology and knowledge-based techniques to identify and track people and summarize their activities. We also describe a people localization system, which identifies and localizes people in an indoor environment. The system uses low-level color features – a color histogram and average vertical color – for building people models and the Bayesian decision-making approach for people localization. The results of a pilot experiment that used 32 h of data (4 days × 8 h) showed the average recall and precision values of 68 and 59% respectively. Augmenting the system with domain knowledge, such as location of working places in cubicles, doors and passages, increased the average recall to 87% and precision to 73%. Valery A. Petrushin is a senior researcher at the Accenture Technology Labs in Chicago, Illinois USA. He received his M.Sc. in Applied Mathematics from the Kharkov State University, Kharkov, Ukraine, and his Ph.D. in Computer Science from Glushkov Institute for Cybernetics, Kiev, Ukraine. He worked as a Director of Intelligent Tutoring Systems at Glushkov Institute for Cybernetics, Ukraine and a researcher at EduTech Institute at the Georgia Tech, GA, USA. His research interests include multimedia data mining, processing, annotation and retrieval. He is the author of two books and more than 130 publications in the fields of computer science, computer-based education, data mining and signal processing. Gang Wei received his Ph.D. in Computer Science at Wayne State University, Detroit, Michigan, in May 2001. After that, he joined the Accenture Technology Labs in Chicago, Illinois, USA. He also worked as a summer intern research staff at Phillips Research Labs in New York State in year 1999 and 2000. His research focuses on multimedia annotation and retrieval, image and video processing, and sensor intelligence. He has published two book chapters, and over 20 papers in this area. Anatole V. Gershman is the global director of technology research for Accenture Technology Labs. He received his Ph.D. degree in Computer Science from Yale University in 1979. He worked at Bell Labs, the Schlumberger Research Centre in Connecticut, Cognitive Systems Inc., and Coopers & Lybrand before he joined Accenture (Andersen Consulting) in 1989. His research interests are in the fields of artificial intelligence, intelligent sensor networks and ubiquitous computing.  相似文献   

13.
This paper reports on a cooperative international evaluation of grapheme-to-phoneme (GP) conversion for text-to-speech synthesis in French. Test methodology and test corpora are described. The results for eight systems are provided and analysed in some detail. The contribution of this paper is twofold: on the one hand, it gives an accurate picture of the state-of-the-art in the domain of GP conversion for French, and points out the problems still to be solved. On the other hand, much room is devoted to a discussion of methodological issues for this task. We hope this could help future evaluations of similar systems in other languages.  相似文献   

14.
Semiconductor manufacturing data consist of the processes and the machines involved in the production of batches of semiconductor circuit wafers. Wafer quality depends on the manufacturing line status and it is measured at the end of the line. We have developed a knowledge discovery system that is intended to help the yield analysis expert by learning the tentative causes of low quality wafers from an exhaustive amount of manufacturing data. The yield analysis expert, by using the knowledge discovered, will decide on which corrective actions to perform on the manufacturing process. This paper discusses the transformations carried out within the data from raw data to discovered knowledge, and also the two main tasks performed by the system. The features of the inductive algorithm performing those tasks are also described. Yield analysis experts at Lucent Technologies, Bell Labs Innovations in Spain are currently using this knowledge discovery application.  相似文献   

15.
The language development of a multilingual text-to-speech system requires contribution from linguists and native speakers of a given language. Text normalization including number expansion is one of the language-specific processing steps. The most available solutions do not support inflections and are not simple enough to be practical for non-technical developers. This paper presents a novel solution for expressing the number expansion rules. The rule framework is fast and easy to use without technical background and truly multilingual supporting gender-specific inflections of numerals. The rules require only a small amount of memory and are conveniently stored as software independent language data. The same rule framework can be extended to carry out other text-normalization tasks including processing of context-dependent abbreviations and interpretation of formatted text such as date and time expressions. The framework has been successfully used in creating number, unit and time conversion rules for 42 languages. The created rules supported cardinal numbers from 0 to 999999 and 13 units such as m, km, h and min. Professional translators without technical background generated the rules for most of the languages. The average number of rule lines for number, unit and time rules were 87, 49 and 13, respectively. The average development time for a full rule set was seven hours per language. The most complex rule sets were in Slavonic languages whereas the simplest ones were in Sino-Tibetan languages.  相似文献   

16.
The computation language of a DNA-based system consists of all the words (DNA strands) that can appear in any computation step of the system. In this work we define properties of languages which ensure that the words of such languages will not form undesirable bonds when used in DNA computations. We give several characterizations of the desired properties and provide methods for obtaining languages with such properties. The decidability of these properties is addressed as well. As an application we consider splicing systems whose computation language is free of certain undesirable bonds and is generated by nearly optimal comma-free codes.  相似文献   

17.
Motivated by packet filtering of firewall systems in Internet applications, we study the fault detection problem in the general Rule-based Software systems. We discuss algorithms for the detection of conflicts in a given set of rules. We first study a constrained version of the fault detection problem and propose a two-phase algorithm. The first phase is to do the rule normalization. The second phase is to detect conflicting rules. For this constrained version of the fault detection problem, the algorithm takes polynomial time. For the general problem, it is NP-hard. We apply the algorithms to the Rule Table getting from one of the firewalls in Bell Labs and report the experiment result.  相似文献   

18.
改进的跨语种语音合成模型自适应方法   总被引:1,自引:0,他引:1  
统计参数语音合成中的跨语种模型自适应主要应用于目标说话人语种与源模型语种不同时,使用目标发音人少量语音数据快速构建具有其音色特征的源模型语种合成系统。本文对传统的基于音素映射和三音素模型的跨语种自适应方法进行改进,一方面通过结合数据挑选的音素映射方法以提高音素映射的可靠性,另一方面引入跨语种的韵律信息映射以弥补原有方法中三音素模型在韵律表征上的不足。在中英文跨语种模型自适应系统上的实验结果表明,改进后系统合成语音的自然度与相似度相对传统方法都有了明显提升。  相似文献   

19.
The question is asked whether it is feasible to use subsets of natural languages as query languages for data bases in actual applications using the question answering system “USER SPECIALTY LANGUAGES” (USL). Methods of evaluating a natural language based information system will be discussed. The results (error and language structure evaluation) suggest how to form the general architecture of application systems which use a subset of German as query language.  相似文献   

20.
The original UNIX system was designed to be small and intelligible, achieving power by generality rather than by a profusion of features. In this spirit we have designed and implemented IX, a multilevel-secure variant of the Bell Labs research system. IX aims at sound, practical security, suitable for private-and public-sector uses other than critical national-security applications. The major security features are: private paths for safe cooperation among privileged processes, structured management of privilege, and security labels to classify information for purposes of privacy and integrity. The labels of flies and processes are checked at every system call that involves data flow and are adjusted dynamically to assure that labels on outputs reflect labels on inputs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号