共查询到20条相似文献,搜索用时 31 毫秒
1.
Gareth Jones Nigel Collier Tetsuya Sakai Kazuo Sumita Hideki Hirakawa 《Computers and the Humanities》2001,35(4):371-388
Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documentsthat they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This papershows how information retrieval and machine translation can becombined in a cross-language information access frameworkto help overcome the language barrier. We presentencouraging preliminary experimental results using English queries toretrieve documents from the standard Japanese language BMIR-J2retrieval test collection. We outline the scope and purpose ofcross-language information access and provide an example applicationto suggest that technology already exists to provide effective andpotentially useful applications. 相似文献
2.
In this article we illustrate a methodology for building cross-language search engine. A synergistic approach between thesaurus-based approach and corpus-based approach is proposed. First, a bilingual ontology thesaurus is designed with respect to two languages: English and Spanish, where a simple bilingual listing of terms, phrases, concepts, and subconcepts is built. Second, term vector translation is used – a statistical multilingual text retrieval techniques that maps statistical information about term use between languages (Ontology co-learning). These techniques map sets of t f id f term weights from one language to another. We also applied a query translation method to retrieve multilingual documents with an expansion technique for phrasal translation. Finally, we present our findings. 相似文献
3.
4.
除了机器翻译,平行语料库对信息检索、信息抽取及知识获取等研究领域具有重要的作用,但是传统的平行语料库只是在句子级对齐,因而对跨语言自然语言处理研究的作用有限。鉴于此,以OntoNotes中英文平行语料库为基础,通过自动抽取、自动映射加人工标注相结合的方法,构建了一个面向信息抽取的高质量中英文平行语料库。该语料库不仅包含中英文实体及其相互关系,而且实现了中英文在实体和关系级别上的对齐。因此,该语料库将有助于中英文信息抽取的对比研究,揭示不同语言在语义表达上的差异,也为跨语言信息抽取的研究提供了一个有价值的平台。 相似文献
5.
6.
Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel
defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We
apply the algorithm in Japanese–English cross-language information retrieval. The results are quite encouraging and are significantly
better than those obtained by other state of the art methods. Computational complexity is an important issue when applying
KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of
applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods.
Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages. 相似文献
7.
8.
交叉语言信息检索研究 总被引:1,自引:0,他引:1
对交叉语言信息检索的模型和特殊的技术要求进行了全面的 分析,开发了一个基于大英百科全书的《哺乳动物》的实验系统。该系统可以用中文检索英 文信息,检索结果用中文表示,较好地验证了这些技术的可行性。 相似文献
9.
随着Web服务技术的不断成熟和发展,互联网上出现了大量的公共Web服务.在使用Web服务开发软件系统的过程中,其文本描述信息(例如简介和使用说明等)可以帮助服务消费者直观有效地识别和理解Web服务并加以利用.已有的研究工作大多关注于从Web服务的WSDL文件中获取此类信息进行Web服务的发现或检索,调研发现,互联网上大部分Web服务的WSDL文件中普遍缺少甚至没有此类信息.为此,提出一种基于网络信息搜索的从WSDL文件之外的信息源为Web服务扩充文本描述信息的方法.从互联网上收集包含目标Web服务特征标识的相关网页,基于从网页中抽取出的信息片段,利用信息检索技术计算信息片段与目标Web服务的相关度,并选取相关度较高的文本片段为Web服务扩充文本描述信息.基于互联网上的真实数据进行的实验,其结果表明,可为约51%的互联网上的Web服务获取到相关网页,并为这些Web服务中约88%扩充文本描述信息.收集到的Web服务及其文本描述信息数据均已公开发布. 相似文献
10.
This paper proposes an effective query-translation approach that enables a cross-language information retrieval (CLIR) service to be more easily supported in digital library systems that only contain monolingual content. A query-translation engine called LiveTrans is used to process the translation requests of cross-lingual queries from connected digital library systems. To automatically extract translations not covered by standard dictionaries, the engine is developed based on a novel integration of dictionary resources and Web mining approaches, including anchor-text and search-result methods. The engine exploits a broad range of multilingual Web resources used as live bilingual corpora to alleviate translation difficulties. It is shown to be particularly effective for extracting multilingual translation equivalents of query terms containing proper names or new terminology. The obtained results show the feasibility of and great potential for creating English-Chinese CLIR services in existing digital libraries and new applications in cross-language Web searching, although difficulties still remain that need to be investigated further. 相似文献
11.
A masss of heterogeneous,distributed and dynamic information on the World Wide Web(the Web) has resulted in “information overload“ .It‘s an important and urgent reserach issue to provide users with effective information retrieval service on the Web.Web search enginees attempt to solve this problem,yet their effect is far from satisfying.In this paper,a distributed and cooperative strategy for information retrieval on the Web is proposed to substitute the centralized mode adopted by the current search engines.Then a new information retrieval system model IRSM is presented.which supports the retrieval of metadata about web documents and uses Z39.50 standard protocol to unify the heterogeneous interfaces of uments and uses Z39.50 standard protocol to unify the heterogeneous interfaces of different systems.Based on that,a distributed and cooperative information refieval framework,called DCIRF,is designed to help users in fast and effective information retrieval on the Web. 相似文献
12.
13.
14.
15.
本文介绍了一个Web维文信息检索系统,此系统根据用户设定的主题对指定的网站进行信息检索。该系统采用在西文信息检索中非常成功的向量空间模型来解决维文信息检索的问题,在维文文档的特征项抽取,加权、相似度计算,模型的建立等方面做了一些探讨,提出了一种针对解决基于网络的维文信息处理(如:维文网页下载,网页内容信息的存储,以及维文检索)的方法。文中论述了系统的设计思想和相关的算法以及实现技术。 相似文献
16.
基于标记树对象抽取技术的Hidden Web获取研究 总被引:6,自引:0,他引:6
目前标准的搜索引擎能够检索的仅仅是WorldWideWeb提供的小部分称为可索引的Web信息。大量的HiddenWeb信息(估计容量是可索引Web的500倍)对这些搜索引擎是不可见的。这些信息隐藏在Web页面的搜索表单后面,保存在大型的动态数据库中。该文提出了一套检索HiddenWeb信息的方法,给出了系统的框架结构,并详细讨论了实现的关键技术。系统采用新的基于标记树的对象抽取(Tag-Tree-basedObjectExtraction)方法自动地从Web页面中抽取HiddenWeb信息,然后在此基础上给出了结构化的HiddenWeb信息查询算法。文章最后对实验结果进行了讨论。 相似文献
17.
设计与实现了基于语句的汉英跨语言检索系统的关键词提取模块,关键词提取模块包括中文关键词提取和翻译转换两部分,此模块为后续的检索模块提供输入信息.其性能及效率对整个跨语言检索系统有重要影响。该文首先通过Dijkstra算法的改进方法求解分词的最短路径,实现了汉语检索语句的分词处理。然后以汉英双语词典为基础,对语法提取后的中文关键词实现了汉英翻译转换。最后提取的关键词供检索使用,实验结果表明用本文方法提取的关键词能满足检索要求。 相似文献
18.
基于本体的远程医疗语义Web服务发现方法研究 总被引:1,自引:0,他引:1
刘冰月 《计算机工程与应用》2010,46(20):230-233
由于现有医疗信息系统之间的异构性,使得各系统间的信息交换无法正常进行。基于SOA的远程医疗诊断系统的出现,可以很好地解决各种异构系统的集成问题。但是,目前的Web服务注册和发现协议如UDDI是基于关键字的检索技术,无法描述Web服务的语义信息,这使得基于传统协议的Web服务发现算法无法自动匹配一些符合要求的服务。为了解决上述问题,基于SOA思想提出了一种基于本体的语义Web服务模型,并给出了相应的Web服务发现算法。该模型能够更准确有效地描述Web服务,使得计算机可以完成自动服务匹配,提高了查全率和查准率。 相似文献
19.
基于元数据与Z39.50的分布协作式Web信息检索 总被引:21,自引:0,他引:21
Web上大量的异质、分布、动态的信息造成了“信息过载”.如何有效地为用户提供Web信息检索已经成为一项重要的研究课题.Web搜索引擎部分地解决了信息检索问题,然而其效果却远远不能令人满意.提出了Web信息检索的分布协作策略以取代传统的集中式信息检索方式;给出了一种新的Web信息检索系统模型,该模型支持对Web文档的元数据进行检索,并采用Z39.50协议作为接口标准,以克服不同信息检索系统之间的访问异构性.在此基础上,设计了一个分布协作式Web信息检索框架,用以帮助用户有效地进行Web信息检索. 相似文献
20.
Hemant Jain Cheng Thao Huimin Zhao 《Information Systems and E-Business Management》2012,10(2):165-181
There are currently many active movements towards computerizing patient healthcare information. As Electronic Medical Record
(EMR) systems are being increasingly adopted in healthcare facilities, however, there is a big challenge in effectively utilizing
this massive information source. It is very time-consuming for healthcare providers to dig into the voluminous medical records
of a patient to find the few that are indeed relevant to the patient’s current problem. Due to the complex semantic relationships
among medical concepts and use of many synonyms, antonyms, and hypernym/hyponym, simple word-based information retrieval does
not produce satisfactory results. In this paper, we propose an EMR retrieval system that leverages semantic query expansion
to retrieve medical records that are relevant to the patient’s current symptom/problem. The proposed framework integrates
various technologies, including information retrieval, domain ontologies, automatic semantic relationship learning, as well
as a body of domain knowledge elicited from healthcare experts. Knowledge of semantic relationships among medical concepts,
such as symptoms, exams and tests, diagnoses, and treatments, as well as knowledge of synonyms and hypernym/hyponyms, is used
to expand and enhance initial queries posed by a user. We have implemented a preliminary prototype and conducted a pilot testing
using sample nursing notes drawn from the EMR system of a community health center. 相似文献