首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 598 毫秒
1.
互联网上很多资源蕴含人类群体智慧.分类网站目录人工地对网站按照主题进行组织.基于网站目录中具有主题标注的URL设计URL主题分类器,结合伪相关反馈技术以及搜索引擎查询日志,提出了自动、快速、有效的查询主题分类方法.具体地,方法为2种策略的结合.策略1通过计算搜索结果中URL的主题分布预测查询主题,策略2基于查询日志点击关系,利用具有主题标注的URL,对查询进行标注获取数据并训练统计分类器预测查询主题.实验表明,方法可获得比当前最好算法更好的准确率,更好的在线处理效率并且可基于查询日志自动获取训练数据,具有良好的可扩展性.  相似文献   

2.
对查询词进行扩展是为了进一步理解用户的搜索意图,使得搜索引擎返回更加准确的信息。已有的方法主要研究如何寻找与查询词相似的词,然而相似的户的词并一定能真正反映用意图。从网络知识库中抽取查询词的待扩展词,并利用通用搜索引擎对待扩展词进行排序,这样的查询词扩展方法充分利用了网络群体智慧,使得扩展词更加贴近用户的搜索期望。通过进行实验对比发现,该方法有较好的结果。  相似文献   

3.
基于Hive的海量搜索日志分析系统研究   总被引:2,自引:0,他引:2  
赵龙  江荣安 《计算机应用研究》2013,30(11):3343-3345
针对传统分布式模型在海量日志并行处理时的可扩展性和并行程序编写困难的问题, 提出了基于Hive的Web海量搜索日志分析机制。利用HQL语言以及Hadoop分布式文件系统(HDFS)和MapReduce编程模式对海量搜索日志进行分析处理, 对用户搜索行为进行了分析研究。对用户搜索行为中的查询热点主题、用户点击数和URL排名、查询会话的分析结果对于搜索引擎的排序算法和系统优化都有一定的指导意义。  相似文献   

4.
电力结构化小文本搜索过程中,由于知识库的差异,使得用户搜索查询词修改率较高,无法满足用户搜索需求。为此,提出了基于意向决策支持的结构化小文本电力搜索引擎技术。根据Web页面的链接关系,通过该URL与主题的相关度计算结果,完成采集数据的处理。根据意向决策支持技术,明确用户意向决策支持结果。构建情景设定知识库、情景分析知识库,并构建情景设定知识库、情景分析知识库,作为电力搜索引擎的运行基础。根据查询目标向量和电力搜索记录,计算相似度两者相似度。利用欧式语义距离计算电力搜索目标向量相似度,将相似度最高的查询结果作为结构化小文本搜索引擎算法的输出结果。实验结果表明:所提出的电力搜索引擎技术与常规技术相比,平均查询词修改率降低了23%、19%。  相似文献   

5.
主题搜索引擎中网络爬虫的搜索策略研究   总被引:2,自引:0,他引:2       下载免费PDF全文
本文对主题搜索引擎中的网络蜘蛛搜索策略进行了详细的分析,在深入分析主题页面在Web上的分布特征与主题相关性判别算法的基础上提出了一个面向主题搜索的网络蜘蛛模型,对模型的组织结构进行了详细阐述。作为主题网络蜘蛛搜索策略的核心部分,主题相关性判断算法是网络蜘蛛能够围绕设定主题进行聚焦检索的关键。在URL的主题相关性判别过程中引入了链接文本及相关链接属性分析,提出了一种新颖的URL主题相关性算法--EPR算法。  相似文献   

6.
《计算机科学与探索》2016,(9):1290-1298
传统的查询推荐算法通过挖掘查询日志为用户推荐查询词。通常现存模型只考虑原始查询词与推荐词之间的关系(例如语义相似性或相关性等),没有考虑用户在搜索过程中的满意度情况。针对用户在搜索过程中表现出的不同满意度状态,提出了一个查询推荐基本假设,并通过开展在线用户问卷调查,验证了这一假设。基于相应的假设,提出了一种基于用户搜索满意度状态的自适应查询推荐模型,该模型可以为用户智能推荐不同种类的查询词。当用户对搜索结果满意时,模型将为用户提供更加新颖的推荐词;当用户对搜索结果不满意时,模型将为用户提供一些增强信息表示能力的查询词。大规模日志实验表明,提出的推荐模型显著优于传统的查询流图模型,证明了所提模型的有效性。  相似文献   

7.
查询式是网络用户搜索时表达其信息需求的主要方式,系统提示的相关词则是用户改善查询的有效工具,该文以这二者为研究对象,从用户的使用行为入手对这二者的特征进行刻画和分析。首先使用日志挖掘的方法,对查询式进行总体的定量描述;进而通过定性分类将查询式中的高频词分为主体词和辅助词两大类,并比照问卷调查的研究结果,发现网络用户在搜索时大量地使用辅助词,主体词的内容相对集中,查询式的长度较短,结构相对简单。在对相关词的研究中,综合问卷调查和对比实验研究结果,发现被试者对搜索引擎提示的相关词认同程度高而应用程度低。该文为理解网络用户搜索时的语言使用提供了实证研究结果,并对搜索引擎索引的改善有一定的参考意义。  相似文献   

8.
随着移动互联网的迅速发展,移动搜索用户大规模增加,移动搜索引擎用户行为分析对改进搜索引擎性能,提高用户体验具有重要意义。该文选取某移动搜索引擎2011年6月第一周的日志,对移动互联网用户搜索行为进行分析和研究。我们从查询词分析、会话分析以及用户点击分析3个角度出发,对查询词长度和频度、问题式查询和网址查询比例、会话内查询个数、查询词修改方式以及用户点击位置进行研究,并与互联网搜索引擎相应指标进行对比。相关分析结论对于移动搜索引擎算法改进与系统优化具有一定参考意义。  相似文献   

9.
为改进元搜索引擎查询速度慢、独立性差的缺点,本文设计了一个元搜索引擎的结果处理模型。该模型结合元搜索引擎的特点设计了一种4级结果集的结构,提高了元搜索引擎结果处理的效率。在结果提取部分提出了根据反馈信息自动调整权重的算法(FBWM),在没有人工干预的情况下自动监视各独立搜索引擎的性能变化并随之动态调整其权重。在结果排序部分,提出了改进的位置/全文排序法(IPFTS),在算法中引入了词条匹配等级的概念,不但能提高搜索结果和查询串相关度的精度,还能保证排名在前的搜索结果的URL的有效性。  相似文献   

10.
针对传统分布式模型在海量日志并行处理时的可扩展性和并行程序编写困难的问题,提出了基于数据仓库的海量搜索日志分析系统架构.利用Hadoop分布式文件系统(HDFS)存储海量搜索日志,并对搜索日志进行清洗处理,采用impala对数据进行高速的处理,将处理后的统计结果导入到数据仓库中,使用Penta-hoBI对数据进行多维分析和统计报表.获取了关键词分析、查询频率、热词排行、查询词和时间分布、网站排名、用户统计等6个分析主题.分析结果对于搜索引擎的排序算法和系统优化都有一定的指导意义.  相似文献   

11.
One of the useful tools offered by existing web search engines is query suggestion (QS), which assists users in formulating keyword queries by suggesting keywords that are unfamiliar to users, offering alternative queries that deviate from the original ones, and even correcting spelling errors. The design goal of QS is to enrich the web search experience of users and avoid the frustrating process of choosing controlled keywords to specify their special information needs, which releases their burden on creating web queries. Unfortunately, the algorithms or design methodologies of the QS module developed by Google, the most popular web search engine these days, is not made publicly available, which means that they cannot be duplicated by software developers to build the tool for specifically-design software systems for enterprise search, desktop search, or vertical search, to name a few. Keyword suggested by Yahoo! and Bing, another two well-known web search engines, however, are mostly popular currently-searched words, which might not meet the specific information needs of the users. These problems can be solved by WebQS, our proposed web QS approach, which provides the same mechanism offered by Google, Yahoo!, and Bing to support users in formulating keyword queries that improve the precision and recall of search results. WebQS relies on frequency of occurrence, keyword similarity measures, and modification patterns of queries in user query logs, which capture information on millions of searches conducted by millions of users, to suggest useful queries/query keywords during the user query construction process and achieve the design goal of QS. Experimental results show that WebQS performs as well as Yahoo! and Bing in terms of effectiveness and efficiency and is comparable to Google in terms of query suggestion time.  相似文献   

12.
Web search engine: Characteristics of user behaviors and their implication   总被引:5,自引:0,他引:5  
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user' s queries and clicked URLs present dramatic locality, which implies that query cache and 'hot click' cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution character-istics of web information are also analyzed, which demonstrates that the link popularity and replica pop-ularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.  相似文献   

13.
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.  相似文献   

14.
15.
An interactive agent-based system for concept-based web search   总被引:1,自引:0,他引:1  
Search engines are useful tools in looking for information from the Internet. However, due to the difficulties of specifying appropriate queries and the problems of keyword-based similarity ranking presently encountered by search engines, general users are still not satisfied with the results retrieved. To remedy the above difficulties and problems, in this paper we present a multi-agent framework in which an interactive approach is proposed to iteratively collect a user's feedback from the pages he has identified. By analyzing the pages gathered, the system can then gradually formulate queries to efficiently describe the content a user is looking for. In our framework, the evolution strategies are employed to evolve critical feature words for concept modeling in query formulation. The experimental results show that the framework developed is efficient and useful to enhance the quality of web search, and the concept-based semantic search can thus be achieved.  相似文献   

16.
Abstract: Content analysis of search engine user queries is an important task, since successful exploitation of the content of queries can result in the design of efficient information retrieval algorithms for more efficient search engines. Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. This study proposes an artificial neural network application in the area of search engine research to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals and query reformulation patterns. Sample data logs from the FAST and Excite search engines are selected to train the neural network and then the neural network is used to identify topic changes in the data log. As a result, almost all the performance measures yielded favourable results.  相似文献   

17.
主要研究了基于深度学习技术挖掘用户搜索主题相关的感兴趣内容。通过深度挖掘算法分析用户搜索记录、查询历史以及用户感兴趣的相关文档视为用户搜索主题数据的来源,进而挖掘兴趣主题。挖掘模型主要采用向量空间模型,将用户搜索主题模型表示成用户搜索主题向量形式。形成主题和用户兴趣关系网,用户搜索主题向量的构造过程:选择一组用户查询词,并对它们进行深度挖掘分类,最后用它们构造用户搜索主题特征向量,进而分析用户兴趣点。结合用户随着时间的变化,以及过程中有不用的搜索词,以及无关的搜索噪声词去掉,调整兴趣度,用户搜索主题需要具有更新学习机制,动态跟踪了用户兴趣变化趋势。该用户搜索主题研究过程克服了数据稀疏、类别偏差、扩展性差等缺点。实验结果表明,该模型识别用户搜索主题准确率良好。  相似文献   

18.
Identifying and interpreting user intent are fundamental to semantic search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving the result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using clickthrough data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.  相似文献   

19.
Web Search is increasingly entity centric; as a large fraction of common queries target specific entities, search results get progressively augmented with semi-structured and multimedia information about those entities. However, search over personal web browsing history still revolves around keyword-search mostly. In this paper, we present a novel approach to answer queries over web browsing logs that takes into account entities appearing in the web pages, user activities, as well as temporal information. Our system, B-hist, aims at providing web users with an effective tool for searching and accessing information they previously looked up on the web by supporting multiple ways of filtering results using clustering and entity-centric search. In the following, we present our system and motivate our User Interface (UI) design choices by detailing the results of a survey on web browsing and history search. In addition, we present an empirical evaluation of our entity-based approach used to cluster web pages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号