首页 | 官方网站   微博 | 高级检索  
 共查询到12条相似文献,搜索用时 0 毫秒
基于统计机器翻译模型的查询扩展   总被引:1,自引:0,他引:1  
在搜索引擎等实际的信息检索应用中,用户提交的查询请求通常都只包含很少的几个关键词,这会引起相关文档与用户查询之间的词不匹配问题,对检索性能有较严重的负面影响。该文在分析了查询产生模型的基础上,提出了一种新的基于统计机器翻译的查询扩展方法。通过统计机器翻译模型提取文档集中与查询词相关联的词,用以进行查询扩展。在TREC数据集上的试验结果表明:基于统计翻译的查询扩展方法不仅比不扩展的语言模型方法始终有12%~17%的提高,而且比流行的查询扩展方法-伪反馈也具有可比的平均准确率。  相似文献   

一种基于概念的信息检索查询扩展   总被引:8,自引:2,他引:6  
文章针对信息检索中存在的查询词“表达差异”问题,提出一种基于概念的查询扩展方法。一方面将用户查询中使用的词或短语与文档中抽出的概念相连接加入原查询.同时将作为扩展词的概念进行分类查询并采用整合排序算法调整结果;另一方面引入概念图供用户手动调节来进行查询扩展,以达到查询优化的目的。试验结果表明。该方法适宜改进Web上的信息检索,相对没有扩展的查询可以大幅度提高查询精度。  相似文献   

This paper describes the experiments with Korean-to-Vietnamese statistical machine translation (SMT). The fact that Korean is a morphologically complex language that does not have clear optimal word boundaries causes a major problem of translating into or from Korean. To solve this problem, we present a method to conduct a Korean morphological analysis by using a pre-analyzed partial word-phrase dictionary (PWD). Besides, we build a Korean-Vietnamese parallel corpus for training SMT models by collecting text from multilingual magazines. Then, we apply such a morphology analysis to Korean sentences that are included in the collected parallel corpus as a preprocessing step. The experiment results demonstrate a remarkable improvement of Korean-to-Vietnamese translation quality in term of bi-lingual evaluation understudy (BLEU).  相似文献   

Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.  相似文献   

In this paper, we propose a classification‐based approach for hybridizing statistical machine translation and rule‐based machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto‐evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut‐off method. In our experiments, using the aforementioned cut‐off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% — a 5.0% improvement over existing methods.  相似文献   

一种基于随机化视觉词典组和查询扩展的目标检索方法   总被引:1,自引:0,他引:1  
在目标检索领域,当前主流的解决方案是视觉词典法(Bag of Visual Words, BoVW),然而,传统的BoVW方法具有时间效率低、内存消耗大以及视觉单词同义性和歧义性的问题。针对以上问题,该文提出了一种基于随机化视觉词典组和查询扩展的目标检索方法。首先,该方法采用精确欧氏位置敏感哈希(Exact Euclidean Locality Sensitive Hashing, E2LSH)对训练图像库的局部特征点进行聚类,生成一组支持动态扩充的随机化视觉词典组;然后,基于这组词典构建视觉词汇分布直方图和索引文件;最后,引入一种查询扩展策略完成目标检索。实验结果表明,与传统方法相比,该文方法有效地增强了目标对象的可区分性,能够较大地提高目标检索精度,同时,对大规模数据库有较好的适用性。  相似文献   

医疗机器翻译对于跨境医疗、医疗文献翻译等应用具有重要价值。汉英神经机器翻译依靠深度学习强大的建模能力和大规模双语平行数据取得了长足的进步。神经机器翻译通常依赖于大规模的平行句对训练翻译模型。目前,汉英翻译数据主要以新闻、政策等领域数据为主,缺少医疗领域的数据,导致医疗领域的汉英机器翻译效果不佳。针对医疗垂直领域机器翻译训练数据不足的问题,该文提出利用复述生成技术对汉英医疗机器翻译数据进行增广,扩大汉英机器翻译的规模。通过多种主流的神经机器翻译模型的实验结果表明,通过复述生成对数据进行增广可以有效地提升机器翻译的性能,在RNNSearch, Transformer等多个主流模型上均取得了6个点以上的BLEU值提升,验证了复述增广方法对领域机器翻译的有效性。同时,基于MT5等大规模预训练语言模型可以进一步地提升机器翻译的性能。  相似文献   

With the rapid development of information technology, short texts arising from socialized human inter- action are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classifi- cation models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suf- fers from the assumption of independent, identically dis- tributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as ob- servations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the ex- ternal reference document. According to Vapnik's meth- ods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the struc- tural risk, which provides a result that allows one to trade off errors on the training sample against improved gener- alization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the ex- perimental results. With respect to established baselines, results of these experiments show that applying our pro- posed document expansion method produces better chance to achieve the improved classification performance.  相似文献   

基于概念格的查询扩展词推荐   总被引:1,自引:0,他引:1  
概念格是一种擅长描述层次关系的数学工具,在规则提取和数据分析中有广泛的应用.引入概念格理论对页面——概念形式背景建立了数学模型,在概念格基础上提出了一种查询扩展词生成算法.利用概念格Hasse图以及关联规则置信度以较高的效率生成扩展词作为二次搜索关键词,使信息搜索达到更好的效果.该算法在Diggol智能元搜索引擎上予以实现,取得了良好的效果.  相似文献   

从图同构角度给出树同构的性质,并阐述了结构异构与结构对齐之间的关系.在此基础上为建立结构映射关系,以及在翻译过程中融入句法结构信息,提出元结构、互译结构组概念及多层次结构对齐的体系.最后利用对数线性模型,给出基于元结构对齐的统计机器翻译模型.模型的翻译过程中,源语言句法树以元结构为单位进行分解,利用互译结构组映射知识,转换为目标语言句法树结构序列,从而根据结构模型信息对目标语实施调序和译文的生成.实验结果表明,本模型在对于翻译知识的泛化能力和翻译结果方面都优于基于短语的统计机器翻译模型.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号