首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 968 毫秒
1.
Technology in the field of digital media generates huge amounts of nontextual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and user-friendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are those containing user-specified keywords. But many documents convey desired semantic information without containing these keywords. This limitation is frequently addressed through query expansion mechanisms based on the statistical co-occurrence of terms. Recall is increased, but at the expense of deteriorating precision. One can overcome this problem by indexing documents according to context and meaning rather than keywords, although this requires a method of converting words to meanings and the creation of a meaning-based index structure. We have solved the problem of an index structure through the design and implementation of a concept-based model using domain-dependent ontologies. An ontology is a collection of concepts and their interrelationships that provide an abstract view of an application domain. With regard to converting words to meaning, the key issue is to identify appropriate concepts that both describe and identify documents as well as language employed in user requests. This paper describes an automatic mechanism for selecting these concepts. An important novelty is a scalable disambiguation algorithm that prunes irrelevant concepts and allows relevant ones to associate with documents and participate in query generation. We also propose an automatic query expansion mechanism that deals with user requests expressed in natural language. This mechanism generates database queries with appropriate and relevant expansion through knowledge encoded in ontology form. Focusing on audio data, we have constructed a demonstration prototype. We have experimentally and analytically shown that our model, compared to keyword search, achieves a significantly higher degree of precision and recall. The techniques employed can be applied to the problem of information selection in all media types.Received: 7 October 2002, Accepted: 20 May 2003, Published online: 30 September 2003Edited by: E. LochovskyThis research has been funded [or funded in part] by the Integrated Media Systems Center, a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152.  相似文献   

2.
Query expansion is an information retrieval technique in which new query terms are selected to improve search performance. Although useful terms can be extracted from documents whose relevance is already known, it is difficult to get enough of such feedback from a user in actual use. We propose a query expansion method that performs well even if a user makes practically minimum effort, that is, chooses only a single relevant document. To improve searches in these conditions, we made two refinements to a well-known query expansion method. One uses transductive learning to obtain pseudorelevant documents, thereby increasing the total number of source documents from which expansion terms can be extracted. The other is a modified parameter estimation method that aggregates the predictions of multiple learning trials to sort candidate terms for expansion by importance. Experimental results show that our method outperforms traditional methods and is comparable to a state-of-the-art method.  相似文献   

3.
基于权重标准化SimRank方法的查询扩展技术研究   总被引:1,自引:0,他引:1  
查询扩展是信息检索中的一项重要技术。传统的局部分析查询扩展方法利用伪相关文档作为候选词集合,然而部分伪相关文档并不具有很高的相关性。该文利用真实的搜索引擎查询日志,建立了查询点击图,经过多次图结构的转化得到能够反映词之间关联程度的词项关系图,并在图结构的相似度算法SimRank的基础上,提出了一种基于权重标准化的改进SimRank方法,该方法利用词项关系图中词项的全局和间接关系,能够有效挖掘与原始查询相关联的扩展词。同时,为降低SimRank算法的计算复杂度,该文采用了剪枝等策略进行优化,使得计算效率有大幅提高。在TREC标准数据集上的实验表明,该文的方法可以有效地选择相关扩展词。MAP指标较局部分析查询扩展方法提高了1.81%,在P@10和P@20指标评价中效果分别提高了5.44%和3.73%。  相似文献   

4.
针对信息检索中存在的词不匹配问题,提出一种基于频繁项集和相关性的局部反馈查询扩展算法。设计查询扩展模型和扩展词权重计算方法,从前列n篇初检文档中,挖掘同时含有查询词项、非查询词项的频繁项集,在该频繁项集中提取非查询词项作为候选扩展词,计算每个候选扩展词与整个查询的相关性,并根据该相关性得到最终的扩展词,以此实现查询扩展。实验结果表明,该算法能有效提高信息检索的性能。  相似文献   

5.
传统的查询扩展方法由于忽略了词之间的语义关系,在不规范的短小关键字上补充扩展的词已经无法达到预期目标。Linked Data技术利用资源描述框架(RDF)图模型形成Linked Open Data Cloud,能提供更多语义信息。针对查询扩展忽略语义的问题,提出了一种基于语义属性特征图的查询扩展方法。该方法将语义网与图的思想融合,利用以DBpedia资源为顶点的属性图加以扩展。首先,通过有监督的学习训练出15种语义属性特征的权重,用于表达扩展资源的有用性;然后,在整个DBpedia图上通过标签属性实现查询关键字到DBpedia匹配资源的映射;再根据属性特征广度搜索出邻接点,并将其作为扩展候选词,最后筛选出词相关行分值最高的作为最终扩展词。实验表明,与LOD Keyword Expansion方法相比,基于语义属性特征图的扩展方法召回率达到0.89,平均逆排序(MRR)提高4个百分点,与用户查询更匹配。  相似文献   

6.

The World Wide Web(WWW) comprises a wide range of information, and it is mainly operated on the principles of keyword matching which often reduces accurate information retrieval. Automatic query expansion is one of the primary methods for information retrieval, and it handles the vocabulary mismatch problem often faced by the information retrieval systems to retrieve an appropriate document using the keywords. This paper proposed a novel approach of hybrid COOT-based Cat and Mouse Optimization (CMO) algorithm named as hybrid COOT-CMO for the appropriate selection of optimal candidate terms in the automatic query expansion process. To improve the accuracy of the Cat and Mouse Optimization (CMO) algorithm, the parameters are tuned with the help of the Coot algorithm. The best suitable expanded query is identified from the available expanded query sets also known as candidate query pools. All feasible combinations in this candidate query pool should be obtained from the top retrieved documents. Benchmark datasets such as the GOV2 Test Collection, the Cranfield Collections, and the NTCIR Test Collection are utilized to assess the performance of the proposed hybrid COOT-CMO method for automatic query expansion. This proposed method surpasses the existing state-of-the-art techniques using many performance measures such as F-score, precision, and mean average precision (MAP).

  相似文献   

7.
查询扩展可以有效地消除查询歧义,提高信息检索的准确率和召回率.通过挖掘用户日志中查询词和相关文档的连接关系,构造关联查询,并在此基础上提出一种从关联查询中提取查询扩展词的查询扩展方法.同时,还提出一种查询歧义的判别方法,该方法可以对查询词所表达的检索意图的模糊程度进行有效度量,也可以对查询词的检索性能进行预先估计.通过对查询歧义的度量来动态调整扩展词的长度,提高查询扩展模型的灵活性和适应能力.  相似文献   

8.
在垃圾短信检索中所使用的关键词与短信文本集中的词不匹配,从而影响检索效果。为此,提出一种基于上下文查询词扩展的检索方法,该方法根据关键词出现的上下文信息进行查询词扩展选择,同时考虑查询扩展词与整个查询语句及查询词的位置关系。选取3 000条短信文本进行实验,结果表明该方法能提高平均查准率。  相似文献   

9.
由于自然语言本身的歧义性和多样性,少数几个关键词难以表达真实的信息需求。查询扩展技术通过挖掘原始查询项的潜在信息,有效地增强了检索系统的理解能力。该文在上下文分析方法计算公式中加入了句子权重概念,即假设由原始查询项返回的句子越重要,则其中出现的词与查询项越相关。同时进一步假设,句中的词与查询项的位置关系与依赖关系也是选取扩展词的重要依据。为此,该文分别提出基于句子权重与位置上下文分析方法(Sentence Weight&Position-based Context Analysis,SWPCA),以及基于句子权重与依赖关系上下文分析方法(Sentence Weight&Dependency-based Context Analysis,SWDCA)。并将这两种查询扩展技术应用于TREC的定义类问题回答,数据显示这两种方法均取得不错成绩,而SWDCA性能更好。  相似文献   

10.
The problem of word mismatch in information retrieval (IR) occurs because users often use different words to describe concepts in their queries than authors use to describe the same concepts in their documents. Query expansion is used to deal with the mismatch between author and user vocabularies. To support query expansion, indices on words related by lexical semantics and syntactical co-occurrence need to be maintained. Two issues become paramount in supporting query expansion: the size of index tables and the query processing overhead. In this paper, we propose to use the notion of multi-granularity for more efficient indexing and query processing while the same degrees of precision and recall are maintained. We also describes extensions of this technique to handle: (1) query relaxation to handle words with multiple senses and with other semantic relationships; (2) progressive processing of queries with top N results and (3) progressive processing of queries with specification of the importance of each keyword.  相似文献   

11.
基于文档与搜索结果上下文的查询扩展方法   总被引:1,自引:0,他引:1  
蒋辉  阳小华 《计算机应用》2009,29(3):852-853
在查询扩展方法中,如果通过查询结果中关键词的上下文来计算候选关键词的权重,将权重大的词作为查询扩展词,其候选关键词来源于文档中关键词的上下文,这种方法存在主题漂移的问题。为了解决这个问题,提出一种将初始查询结果过滤,只选择与源文档语境相似的搜索结果,来帮助选择查询扩展词的方法。实验结果表明该方法能获得更合适的查询扩展词。  相似文献   

12.
Query expansion is a well-known method for improving average effectiveness in information retrieval. The most effective query expansion methods rely on retrieving documents which are used as a source of expansion terms. Retrieving those documents is costly. We examine the bottlenecks of a conventional approach and investigate alternative methods aimed at reducing query evaluation time. We propose a new method that draws candidate terms from brief document summaries that are held in memory for each document. While approximately maintaining the effectiveness of the conventional approach, this method significantly reduces the time required for query expansion by a factor of 5–10.  相似文献   

13.
《Computers in Industry》2014,65(6):937-951
Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms.The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.  相似文献   

14.
基于上下文的查询扩展   总被引:5,自引:0,他引:5  
针对信息检索查询所使用的词可能与文档集中使用的词不匹配从而影响检索效果这一信息检索关键问题,提出了一种基于上下文的查询扩展方法,该方法根据查询的上下文信息对扩展词进行选择,同时考虑到扩展词与整个查询句以及与查询词的位置关系.在TREC信息检索测试集上进行的实验表明,相对于通常简单的语言模型,方法取得了5%~19%的提高.与流行的基于伪反馈的查询扩展方法相比,提出的方法也具有相当的平均准确率.  相似文献   

15.
基于权重查询词的XML结构查询扩展   总被引:9,自引:0,他引:9  
万常选  鲁远 《软件学报》2008,19(10):2611-2619
文本文档信息检索中检索质量不高的一个主要原因是用户难以提出准确的描述查询意图的查询表达式. 而XML文档除了具有文本文档的内容特征外,还具有结构特征,导致用户更难以提出准确的查询表达式.为了解决这一问题,提出一种基于相关反馈的查询扩展方法,可以帮助用户构建满足查询意图的"内容 结构"的查询表达式.该方法首先进行查询词扩展,找到最能代表用户查询意图的权重扩展查询词;然后在扩展查询词的基础上进行结构查询扩展;最终形成完整的"内容 结构"的查询扩展表达式.实验结果表明,与未进行查询扩展相比,扩展后prec@10和prec@20的平均准确率提高30%以上.  相似文献   

16.
以关键词抽取为核心的文摘句选择策略   总被引:3,自引:0,他引:3  
针对面向查询的多文档自动文摘,该文提出了一种以关键词抽取为核心的文摘句选择策略。通过查询扩展的相关技术得到相关多文档集中词语的查询相关性特征,利用最大似然估计法得到语料中词语的话题相关性特征,并将这两个特征值进行特征融合得到词语的重要度以确定关键词。然后通过关键词的重要度来给候选句打分,进一步利用改进的MMR(Maximal Marginal Relevance)技术来调整候选句的得分,最后生成文摘。该文将特征融合引入到词语层面,在DUC2005的语料中测试取得了较好的效果。  相似文献   

17.
查询扩展是提高检索效果的有效方法,传统的查询扩展方法大都以单个查询词的相关性来扩展查询词,没有充分考虑词项之间、文档之间以及查询之间的相关性,使得扩展效果不佳。针对此问题,该文首先通过分别构造词项子空间和文档子空间的Markov网络,用于提取出最大词团和最大文档团,然后根据词团与文档团的映射关系将词团分为文档依赖和非文档依赖词团,并构建基于文档团依赖的Markov网络检索模型做初次检索,从返回的检索结果集合中构造出查询子空间的Markov网络,用于提取出最大查询团,最后,采用迭代的方法计算文档与查询的相关概率,并构建出最终的基于迭代方法的多层Markov网络信息检索模型。实验结果表明 该文的模型能较好地提高检索效果。  相似文献   

18.
The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon?test.  相似文献   

19.
基于概念图的信息检索的查询扩展模型   总被引:1,自引:0,他引:1  
针对传统的基于关键词匹配的信息检索存在的查全率和精确率不高的问题,提出一种基于概念图匹配的查询扩展方法:一方面通过知网对用户查询的词或者句子进行扩展后,将用户查询和文档生成概念图;另一方面利用概念图的不完全匹配和语义相似度的计算方法计算概念图的相似度,以提高检索效果。实验结果表明该方法取得了良好的效果。  相似文献   

20.
基于查询扩展词条加权的文本检索研究   总被引:1,自引:1,他引:0  
本文分析了关键词检索文本,由于其查询词没有扩展导致检全率低;而概念检索文本虽然部分有检索词扩展,但是查询词权重与原查询词没有区分.为此,本文利用词条间的语义相似度,提出一种查询扩展词条权重计算方法--展开减小法,并将查询词以及扩展词经展开减小法计算权重后构建向量空间模型检索文本.实验表明,构建的检索模型检索文本,其综合...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号