首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 301 毫秒
1.
识别搜索引擎用户的查询意图在信息检索领域是备受关注的研究内容。文中提出一种融合多类特征识别Web查询意图的方法。将Web查询意图识别作为一个分类问题,并从不同类型的资源包括查询文本、搜索引擎返回内容及Web查询日志中抽取出有效的分类特征。在人工标注的真实Web查询语料上采用文中方法进行查询意图识别实验,实验结果显示文中采用的各类特征对于提高查询意图识别的效果皆有一定帮助,综合使用这些特征进行查询意图识别,88。5%的测试查询获得准确的意图识别结果。  相似文献   

2.
Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it.  相似文献   

3.
The general public is increasingly using search engines to seek information on risks and threats. Based on a search log from a large search engine, spanning three months, this study explores user patterns of query submission and subsequent clicks in sessions, for two important risk related topics, healthcare and information security, and compares them to other randomly sampled sessions. We investigate two session-level metrics reflecting users' interactivity with a search engine: session length and query click rate. Drawing from information foraging theory, we find that session length can be characterized well by the Inverse Gaussian distribution. Among three types of sessions on different topics (healthcare, information security, and other randomly sampled sessions), we find that healthcare sessions have the most queries and the highest query click rate, and information security sessions have the lowest query click rate. In addition, sessions initiated by the users with greater search engine activity level tend to have more queries and higher query click rates. Among three types of sessions, search engine activity level shows the strongest effect on query click rate for information security sessions and weakest for healthcare sessions. We discuss theoretical and practical implications of the study.  相似文献   

4.
在计算广告学中,为用户查询返回相关的广告一直是研究的热点。然而用户的查询一般比较简短,广告的表示也局限在简短的创意和一些竞价词上,返回符合用户查询意图的广告十分困难。为了解决这个问题,该文提出利用多特征融合的方法进行广告查询扩展,先将查询输入到搜索引擎中,获得Top-k网页查询结果,将它们作为获取扩展词的外部资源,由于采用一般的特征选取方法获取扩展词采用的特征比较单一,缺乏语义信息,容易产生主题漂移现象,该文通过计算扩展词和查询词在网页查询结果中的共现度,并融合传统的TF特征和词性信息,获得与原始查询语义相关的扩展词。在真实的广告语料上的实验结果显示,基于多特征融合的选择广告扩展词的方法能有效地提高返回广告的相关性。  相似文献   

5.
How to automatically understand and answer users' questions (eg, queries issued to a search engine) expressed with natural language has become an important yet difficult problem across the research fields of information retrieval and artificial intelligence. In a typical interactive Web search scenario, namely, session search, to obtain relevant information, the user usually interacts with the search engine for several rounds in the forms of, eg, query reformulations, clicks, and skips. These interactions are usually mixed and intertwined with each other in a complex way. For the ideal goal, an intelligent search engine can be seen as an artificial intelligence agent that is able to infer what information the user needs from these interactions. However, there still exists a big gap between the current state of the art and this goal. In this paper, in order to bridge the gap, we propose a Markov random field–based approach to capture dependence relations among interactions, queries, and clicked documents for automatic query expansion (as a way of inferring the information needs of the user). An extensive empirical evaluation is conducted on large‐scale web search data sets, and the results demonstrate the effectiveness of our proposed models.  相似文献   

6.
随着移动互联网的迅速发展,移动搜索用户大规模增加,移动搜索引擎用户行为分析对改进搜索引擎性能,提高用户体验具有重要意义。该文选取某移动搜索引擎2011年6月第一周的日志,对移动互联网用户搜索行为进行分析和研究。我们从查询词分析、会话分析以及用户点击分析3个角度出发,对查询词长度和频度、问题式查询和网址查询比例、会话内查询个数、查询词修改方式以及用户点击位置进行研究,并与互联网搜索引擎相应指标进行对比。相关分析结论对于移动搜索引擎算法改进与系统优化具有一定参考意义。  相似文献   

7.
8.
针对通用搜索引擎缺乏对网页内容的时态表达式的准确抽取及语义查询支持,提出时态语义相关度算法(TSRR)。在通用搜索引擎基础上添加了时态信息抽取和时态信息排序功能,通过引入时态正则表达式规则,抽取查询关键词和网页文档中的时态点或时态区间等时态表达式,综合计算网页内容的文本相关度和时态语义相关度,从而得到网页的最终排序评分。实验表明,应用TSRR算法可以准确而有效地匹配与时态表达式相关的关键词查询。  相似文献   

9.
Thousands of users issue keyword queries to the Web search engines to find information on a number of topics. Since the users may have diverse backgrounds and may have different expectations for a given query, some search engines try to personalize their results to better match the overall interests of an individual user. This task involves two great challenges. First the search engines need to be able to effectively identify the user interests and build a profile for every individual user. Second, once such a profile is available, the search engines need to rank the results in a way that matches the interests of a given user. In this article, we present our work towards a personalized Web search engine and we discuss how we addressed each of these challenges. Since users are typically not willing to provide information on their personal preferences, for the first challenge, we attempt to determine such preferences by examining the click history of each user. In particular, we leverage a topical ontology for estimating a user’s topic preferences based on her past searches, i.e. previously issued queries and pages visited for those queries. We then explore the semantic similarity between the user’s current query and the query-matching pages, in order to identify the user’s current topic preference. For the second challenge, we have developed a ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of a given user. Our experimental evaluation on the Google query-stream of human subjects over a period of 1 month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results.  相似文献   

10.
网络信息检索在当前互联网社会得到了广泛应用,但是其检索准确性却不容乐观,究其原因是割裂了检索关键词之间的概念联系。从一类限定领域的用户需求入手,以搜索引擎作为网络语料资源的访问接口,综合利用规则与统计的方法,生成查询需求的语义概念图。可将其作为需求分析的结果,导引后续的语义检索过程,提高用户查询与返回结果的相关性。实验结果表明,生成方法是有效可行的,对基于概念图的语义检索有一定的探索意义。  相似文献   

11.
12.
When classifying search queries into a set of target categories, machine learning based conventional approaches usually make use of external sources of information to obtain additional features for search queries and training data for target categories. Unfortunately, these approaches rely on large amount of training data for high classification precision. Moreover, they are known to suffer from inability to adapt to different target categories which may be caused by the dynamic changes observed in both Web topic taxonomy and Web content. In this paper, we propose a feature-free classification approach using semantic distance. We analyze queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity. The most attractive feature of our approach is that it only utilizes the Web page counts estimated by a search engine to provide the search query classification with respectable accuracy. In addition, it can be easily adaptive to the changes in the target categories, since machine learning based approaches require extensive updating process, e.g., re-labeling outdated training data, re-training classifiers, to name a few, which is time consuming and high-cost. We conduct experimental study on the effectiveness of our approach using a set of rank measures and show that our approach performs competitively to some popular state-of-the-art solutions which, however, frequently use external sources and are inherently insufficient in flexibility.  相似文献   

13.
搜索引擎性能评估是信息检索界一个重要课题.长查询具有较为丰富的信息内容,能更加准确地描述用户的信息需求.在此基础上文中提出长查询用户满意度分析的整体框架,定义用户满意度的概念,并在用户日志中提取相关用户行为特征,应用决策树和SVM两种分类算法评测用户满意度.在大规模商业搜索引擎日志上完成的实验结果证明了这套评价体系的有效性.结果表明,用户对于查询满意和不满意的分类准确率分别达到86%和70%.  相似文献   

14.
网络搜索分析在优化搜索引擎方面具有举足轻重的作用,而且对用户个人搜索特性进行分析能够提高搜索引擎的精准度。目前,大多数已有模型(比如点击图模型及其变体),注重研究用户群体的共同特点。然而,关于如何做到既可以获取用户群体共同特点又可以获取用户个人特点方面的研究却非常少。本文研究了基于个人用户网络搜索分析新问题,即通过研究用户搜索的突发性现象,获取个人用户搜索查询的主题分布情况。提出了两个搜索主题模型,即搜索突发性模型(SBM)和耦合敏感搜索突发性模型(CS-SBM)。SBM假设查询词和URL主题是无关的,CS-SBM假设查询词和URL之间是有主题关联的,得到的主题分布信息存储在偏Dirichlet先验中,采用Beta分布刻画用户搜索的时间特性。实验结果表明,每一个用户的网络搜索轨迹都有多种基于用户的独有特点。同时,在使用大量真实用户查询日志数据情况下,与LDA、DCMLDA、TOT相比,本文提出的模型具有明显的泛化性能优势,并且有效地描绘了用户搜索查询主题在时间上的变化过程。  相似文献   

15.
Semantic Web search is a new application of recent advances in information retrieval (IR), natural language processing, artificial intelligence, and other fields. The Powerset group in Microsoft develops a semantic search engine that aims to answer queries not only by matching keywords, but by actually matching meaning in queries to meaning in Web documents. Compared to typical keyword search, semantic search can pose additional engineering challenges for the back-end and infrastructure designs. Of these, the main challenge addressed in this paper is how to lower query latencies to acceptable, interactive levels. Index-based semantic search requires more data processing, such as numerous synonyms, hypernyms, multiple linguistic readings, and other semantic information, both on queries and in the index. In addition, some of the algorithms can be super-linear, such as matching co-references across a document. Consequently, many semantic queries can run significantly slower than the same keyword query. Users, however, have grown to expect Web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. It is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them. Our approach to tackle this challenge is to exploit data parallelism in slow search queries to reduce their latency in multi-core systems. Although all search engines are designed to exploit parallelism, at the single-node level this usually translates to throughput-oriented task parallelism. This paper focuses on the engineering of two latency-oriented approaches (coarse- and fine-grained) and compares them to the task-parallel approach. We use Powerset’s deployed search engine to evaluate the various factors that affect parallel performance: workload, overhead, load balancing, and resource contention. We also discuss heuristics to selectively control the degree of parallelism and consequent overhead on a query-by-query level. Our experimental results show that using fine-grained parallelism with these dynamic heuristics can significantly reduce query latencies compared to fixed, coarse-granularity parallelization schemes. Although these results were obtained on, and optimized for, Powerset’s semantic search, they can be readily generalized to a wide class of inverted-index search engines.  相似文献   

16.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

17.
In order to understand user intents behind their queries, many researchers study similar query finding. Recently, the click graph has shown its utility in describing the relationship between queries and URLs. The previous approaches mainly either generate related terms or find relevant queries based on the co-clicked URLs. However, these approaches may suffer from the complexity of natural language processing and click-through data sparseness. In this paper, we tackle this problem through three query probability distribution representation models: Click Model, Term Model, and Semantic Model. The Click Model extracts credible transition probability from queries to URLs, and describes a query without considering web contents. The Term Model focuses on representing a query via term distribution over its main entities and purposes, which can better capture information needs behind short and ambiguous keyword queries. The Semantic Model learns potential intent distribution of queries to distinguish user intents behind a query. Among the three models, we apply pairwise similarity metrics and graph-based personalized pagerank to find similar queries. Compared to traditional representation models, our representation models are verified to be effective and efficient, especially for long tail queries.  相似文献   

18.
To avoid returning irrelevant web pages for search engine results, technologies that match user queries to web pages have been widely developed. In this study, web pages for search engine results are classified as low-adjacence (each web page includes all query keywords) or high-adjacence (each web page includes some of the query keywords) sets. To match user queries with web pages using formal concept analysis (FCA), a concept lattice of the low-adjacence set is defined and the non-redundancy association rules defined by Zaki for the concept lattice are extended. OR- and AND-RULEs between non-query and query keywords are proposed and an algorithm and mining method for these rules are proposed for the concept lattice. The time complexity of the algorithm is polynomial. An example illustrates the basic steps of the algorithm. Experimental and real application results demonstrate that the algorithm is effective.  相似文献   

19.
We present a new text-to-image re-ranking approach for improving the relevancy rate in searches. In particular, we focus on the fundamental semantic gap that exists between the low-level visual features of the image and high-level textual queries by dynamically maintaining a connected hierarchy in the form of a concept database. For each textual query, we take the results from popular search engines as an initial retrieval, followed by a semantic analysis to map the textual query to higher level concepts. In order to do this, we design a two-layer scoring system which can identify the relationship between the query and the concepts automatically. We then calculate the image feature vectors and compare them with the classifier for each related concept. An image is relevant only when it is related to the query both semantically and content-wise. The second feature of this work is that we loosen the requirement for query accuracy from the user, which makes it possible to perform well on users’ queries containing less relevant information. Thirdly, the concept database can be dynamically maintained to satisfy the variations in user queries, which eliminates the need for human labor in building a sophisticated initial concept database. We designed our experiment using complex queries (based on five scenarios) to demonstrate how our retrieval results are a significant improvement over those obtained from current state-of-the-art image search engines.  相似文献   

20.
信息检索的效果很大程度上取决于用户能否输入恰当的查询来描述自身信息需求。很多查询通常简短而模糊,甚至包含噪音。查询推荐技术可以帮助用户提炼查询、准确描述信息需求。为了获得高质量的查询推荐,在大规模“查询-链接”二部图上采用随机漫步方法产生候选集合。利用摘要点击信息对候选列表进行重排序,使得体现用户意图的查询排在比较高的位置。最终采用基于学习的算法对推荐查询中可能存在的噪声进行过滤。基于真实用户行为数据的实验表明该方法取得了较好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号