期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bilingual LSA-based adaptation for statistical machine translation

Yik-Cheung Tam Ian Lane Tanja Schultz 《Machine Translation》2007,21(4):187-207

We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system. 相似文献

2.

A syntactically informed reordering model for statistical machine translation

Saeed Farzi Shahram Khadivi 《人工智能实验与理论杂志》2013,25(4):449-469

Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models. 相似文献

3.

统计机器翻译中翻译规则抽取

刘颖姜巍《计算机工程与应用》2012,48(32):98-101,146

对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。相似文献

4.

Example-based machine translation: a review and commentary

John Hutchins 《Machine Translation》2005,19(3-4):197-211

In the last decade the dominant models of MT have been data-driven or corpus-based. Of the two main trends, statistical machine translation and example-based machine translation (EBMT), the latter is much less clearly defined. In a review of the recently published collection edited by Michael Carl and Andy Way, this essay surveys the basic processes, methods, main problems and tasks of EBMT, and attempts to provide a definition of the essence of EBMT in comparison with statistical MT and traditional rule-based MT. Recent Advances in Example-based Machine Translation. Edited by Michael Carl and Andy Way. Dordrecht: Kluwer Academic Publishers, 2003. xxxi, 482pp. (Text, Speech and Language Technology, vol. 21) ISBN: 1-4020-1400-7 (hardback), 1-4020-1401-5 (paperback). 相似文献

5.

Pivot language approach for phrase-based statistical machine translation

Hua Wu Haifeng Wang 《Machine Translation》2007,21(3):165-181

This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L _s and L _t with limited bilingual resources, we bring in a third language, L _p, called the pivot language. For the language pairs L _s − L _p and L _p − L _t, there exist large bilingual corpora. Using only L _s − L _p and L _p − L _t bilingual corpora, we can build a translation model for L _s − L _t. The advantage of this method lies in the fact that we can perform translation between L _s and L _t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L _s − L _t bilingual corpus available, our method can further improve translation quality by using the additional L _s − L _p and L _p − L _t bilingual corpora. 相似文献

6.

《Journal of Web Semantics》2016

相似文献

7.

Example-based machine translation based on tree–string correspondence and statistical generation

Zhanyi Liu Haifeng Wang Hua Wu 《Machine Translation》2006,20(1):25-41

This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems. 相似文献

8.

Semi-supervised statistical region refinement for color image segmentation

Richard Nock Frank Nielsen 《Pattern recognition》2005,38(6):835-846

Some authors have recently devised adaptations of spectral grouping algorithms to integrate prior knowledge, as constrained eigenvalues problems. In this paper, we improve and adapt a recent statistical region merging approach to this task, as a non-parametric mixture model estimation problem. The approach appears to be attractive both for its theoretical benefits and its experimental results, as slight bias brings dramatic improvements over unbiased approaches on challenging digital pictures. 相似文献

9.

On the use of different loss functions in statistical pattern recognition applied to machine translation

J. Andrs-Ferrer D. Ortiz-Martínez I. García-Varea F. Casacuberta 《Pattern recognition letters》2008,29(8):1072-PRintPerclntel

In pattern recognition, an elegant and powerful way to deal with classification problems is based on the minimisation of the classification risk. The risk function is defined in terms of loss functions that measure the penalty for wrong decisions. However, in practice a trivial loss function is usually adopted (the so-called 0–1 loss function) that do no make the most of this framework. This work is focused on the study of different loss functions, and specially on those loss functions that do not depend on the class proposed by the system. Loss functions of this kind have allowed us to theoretically explain heuristics that are successfully used with very complex pattern recognition problem, such as (statistical) machine translation. A comparative experimental work has also been carried out to compare different proposals of loss functions in the practical scenario of machine translation. 相似文献

10.

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

Christopher Quirk Arul Menezes 《Machine Translation》2006,20(1):43-65

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation. 相似文献

11.

The scaling problem in the pattern recognition approach to machine translation

D. Ortiz-Martínez I. García-Varea F. Casacuberta 《Pattern recognition letters》2008,29(8):1145-PRintPerclntel

Statistical machine translation (SMT) has proven to be an interesting pattern recognition framework for automatically building machine translations systems from available parallel corpora. In the last few years, research in SMT has been characterized by two significant advances. First, the popularization of the so called phrase-based statistical translation models, which allows to incorporate local contextual information to the translation models. Second, the availability of larger and larger parallel corpora, which are composed of millions of sentence pairs, and tens of millions of running words. Since phrase-based models basically consists in statistical dictionaries of phrase pairs, their estimation from very large corpora is a very costly task that yields a huge number of parameters which are to be stored in memory. The handling of millions of model parameters and a similar number of training samples have become a bottleneck in the field of SMT, as well as in other well-known pattern recognition tasks such as speech recognition or handwritten recognition, just to name a few. In this paper, we propose a general framework that deals with the scaling problem in SMT without introducing significant time overhead by means of the combination of different scaling techniques. This new framework is based on the use of counts instead of probabilities, and on the concept of cache memory. 相似文献

12.

MT model space: statistical versus compositional versus example-based machine translation

Dekai Wu 《Machine Translation》2005,19(3-4):213-227

We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to MT. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie. 相似文献

13.

Knowledge-based disambiguation for machine translation

Joachim Quantz Birte Schmitz 《Minds and Machines》1994,4(1):39-57

相似文献

14.

Knowledge-based machine translation

Sergei Nirenburg 《Machine Translation》1989,4(1):5-24

This paper provides an overview of the KBMT-89 project at Carmegie Mellon University's Center for Machine Translation, as well therefore of the special number of this journal, which reports on the project. The knowledge-based approach to machine translation is presented and defended in a historical context. Various components of the system, key parts of which are described in subsequent papers of the issue, are introduced and paired with their computational motivations. 相似文献

15.

Constructive machine translation evaluation 总被引：1，自引：0，他引：1

Stephen Minnis 《Machine Translation》1993,8(1-2):67-75

When surveying the many methods currently employed in MT evaluation,¹ it is not immediately obvious that the methods used serve to increase the knowledge of the properties being measured. This report describes aconstructive machine translation evaluation method, aimed at addressing this issue.² Edited version of a presentation given to the International Working Group on the Evaluation of Machine Translation Systems, Vaud, Switzerland, April 1991. 相似文献

16.

基于机器翻译的双语协同关系抽取

胡亚楠惠浩添钱龙华朱巧明《计算机应用研究》2015,32(3)

传统的弱指导关系抽取研究主要集中于单语言内部.为了充分利用语言之间的互补性来减轻对大规模训练数据的需求,提出一种双语协同训练的关系分类方法.针对小规模标注语料和一定规模的未标注语料,通过机器翻译和实体对齐产生关系实例的双语视图,最后利用协同训练得到两种语言的分类模型.在ACERDC 2005中英文语料上的实验表明,双语协同训练方法可以同时提高中文和英文的关系分类性能,并且减少对于标注训练数据量的需求. 相似文献

17.

泛化语言模型在汉维机器翻译中的应用

李响南江杨雅婷周喜米成刚《计算机应用研究》2014,31(10)

针对汉维统计机器翻译中维吾尔语具有长距离依赖问题和语言模型具有数据稀疏现象,提出了一种基于泛化的维吾尔语语言模型.该模型借助维吾尔语语言模型的训练过程中生成的文本,结合字符串相似度算法,取相似的维文字符串经过归一化处理抽取规则,计算规则的参数值,利用规则给测试集在解码过程中生成n-best译文重新评分,将评分最高的译文作为最佳译文.实验结果表明,泛化语言模型减少了存储空间,同时,规则的合理使用有效地提高了翻译译文的质量. 相似文献

18.

Semi-supervised learning for character recognition in historical archive documents

Jan Richarz Szilard Vajda Rene Grzeszick Gernot A. Fink 《Pattern recognition》2014

Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment. 相似文献

19.

一种基于实例的汉英机器翻译策略 总被引：3，自引：0，他引：3

胡国全陈家骏戴新宇尹存燕《计算机工程与设计》2005,26(4):900-903,906

介绍了一种基于实例的汉英机器翻译策略,重点讨论了汉英双语语料库的设计和基于该语料库的汉语句子的匹配算法。在进行汉语句子的匹配时,根据汉语的特点直接采用汉字的匹配,而没有进行汉语句子的分词。另外,匹配时确定匹配片断的边界也是基于实例机器翻译的难点之一,在这方面也采取了相应的解决方法。没有对翻译句子的连接装配进行更深入的研究,这是因为该翻译策略是用于多翻译引擎系统的,它要与其它翻译策略配合使用,以提高翻译结果的正确率。基于实例的机器翻译需要大量的双语语料库作为翻译时的依据,而人工建设大型语料库费时费力,所以尝试采用计算机进行汉英双语语料库的自动建立,包括篇章对齐和单词级的对齐。相似文献

20.

A Semi-supervised Learning Algorithm on Gaussian Mixture with Automatic Model Selection

Zhiwu Lu Yuxin Peng 《Neural Processing Letters》2008,27(1):57-66

In Gaussian mixture modeling, it is crucial to select the number of Gaussians for a sample set, which becomes much more difficult when the overlap in the mixture is larger. Under regularization theory, we aim to solve this problem using a semi-supervised learning algorithm through incorporating pairwise constraints into entropy regularized likelihood (ERL) learning which can make automatic model selection for Gaussian mixture. The simulation experiments further demonstrate that the presented semi-supervised learning algorithm (i.e., the constrained ERL learning algorithm) can automatically detect the number of Gaussians with a good parameter estimation, even when two or more actual Gaussians in the mixture are overlapped at a high degree. Moreover, the constrained ERL learning algorithm leads to some promising results when applied to iris data classification and image database categorization. 相似文献