期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Measuring semantic similarity between words by removing noise and redundancy in web snippets

Zheng Xu Xiangfeng Luo Jie Yu Weimin Xu 《Concurrency and Computation》2011,23(18):2496-2510

Semantic similarity measures play important roles in many Web‐related tasks such as Web browsing and query suggestion. Because taxonomy‐based methods can not deal with continually emerging words, recently Web‐based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web‐snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web‐related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web‐based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

2.

Effective rank aggregation for metasearching

Leonidas AkritidisAuthor VitaeDimitrios KatsarosAuthor Vitae Panayiotis BozanisAuthor Vitae 《Journal of Systems and Software》2011,84(1):130-143

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the search systems, the effectiveness of a metasearch engine is mainly determined by the quality of the results it returns in response to user queries. Since these services do not maintain their own document index, they exploit multiple search engines using a rank aggregation method in order to classify the collected results. However, the rank aggregation methods which have been proposed until now, utilize a very limited set of parameters regarding these results, such as the total number of the exploited resources and the rankings they receive from each individual resource. In this paper we present QuadRank, a new rank aggregation method, which takes into consideration additional information regarding the query terms, the collected results and the data correlated to each of these results (title, textual snippet, URL, individual ranking and others). We have implemented and tested QuadRank in a real-world metasearch engine, QuadSearch, a system developed as a testbed for algorithms related to the wide problem of metasearching. The name QuadSearch is related to the current number of the exploited engines (four). We have exhaustively tested QuadRank for both effectiveness and efficiency in the real-world search environment of QuadSearch and also, using a task from the recent TREC-2009 conference. The results we present in our experiments reveal that in most cases QuadRank outperformed all component engines, another metasearch engine (Dogpile) and two successful rank aggregation methods, Borda Count and the Outranking Approach. 相似文献

3.

An interactive agent-based system for concept-based web search 总被引：1，自引：0，他引：1

Wei-Po Lee Tsung-Che Tsai 《Expert systems with applications》2003,24(4):365-373

Search engines are useful tools in looking for information from the Internet. However, due to the difficulties of specifying appropriate queries and the problems of keyword-based similarity ranking presently encountered by search engines, general users are still not satisfied with the results retrieved. To remedy the above difficulties and problems, in this paper we present a multi-agent framework in which an interactive approach is proposed to iteratively collect a user's feedback from the pages he has identified. By analyzing the pages gathered, the system can then gradually formulate queries to efficiently describe the content a user is looking for. In our framework, the evolution strategies are employed to evolve critical feature words for concept modeling in query formulation. The experimental results show that the framework developed is efficient and useful to enhance the quality of web search, and the concept-based semantic search can thus be achieved. 相似文献

4.

Impact of training standard complexity on inspection performance

Pradeep Rao Shannon R. Bowling Mohammad T. Khasawneh Anand K. Gramopadhye Brian J. Melloy 《人机工程学与制造业中的人性因素》2006,16(2):109-132

Research in the area of visual inspection has shown that various factors influence inspection performance. Task factors have been identified as one of the primary classes of factors influencing the complexity of inspection tasks. If inspection task complexity is to be reduced it is essential to understand the influence of various task factors and prescribe interventions based on the impact of these factors. Moreover, historical work in this area has shown that the greater the difficulty of a vigilance task, the more engaged operators may become. Therefore, this research studies the influence of the following task factors: number of defect types, defect standard complexity, defect probability, and defect distribution on both the visual search and decision‐making components of a contact lens inspection task. This study was conducted using a computer simulation of a real world contact lens inspection task using 28 student subjects. Performance was measured on both the visual search and decision‐making components of the task. The results revealed a negative influence of defect standard complexity and a positive influence of defect probability on both the visual search and decision‐making components of the inspection task. © 2006 Wiley Periodicals, Inc. Hum Factors Man 16: 109–132, 2006. 相似文献

5.

主题搜索引擎中爬虫搜索策略的研究

史宝明贺元香吴崇正《计算机工程与应用》2014,(2):116-119,128

为了解决传统主题爬虫效率偏低的问题,传统主题爬虫会选择最有价值的链接进行访问,仅简单地计算链接的相关性,却忽视待分析URL之间的相关性关系,致使主题爬虫爬取效率较低。提出一种基于链接模型的相关性判别算法,综合利用有标种子URL和无标的待判别URL实现对无标URL的相关性判别,并推导出迭代初值选取对结果的不敏感性。实验结果表明,与传统的网络爬虫算法相关性判别方法相比,提出的方法效率更高。相似文献

6.

搜索引擎与用户评价系统的整合

LIU Ling 《数字社区&智能家居》2008,(7)

为了提供更具可信度的搜索结果以及更高的用户满意度,一个可行的方案是在现行的搜索引擎的架构上引入用户评价系统。经过改良后的搜索引擎在对搜索结果进行排序时应该不仅仅考虑网址指向因素,同时应该考虑用户评价因素。采用的用户评价系统应该对所有使用者开放,界面应该尽量简单,同时应该对不同的评价给予不同的权重。相似文献

7.

搜索引擎与用户评价系统的整合

刘凌《数字社区&智能家居》2008,(3):1207-1208

为了提供更具可信度的搜索结果以及更高的用户满意度,一个可行的方案是在现行的搜索;引擎的架构上引入用户评价系统。经过改良后的搜索引擎在对搜索结果进行排序时应该不仅仅考虑网址指向因素。同时应该考虑用户评价因素。采用的用户评价系统应该对所有使用者开放,界面应该尽量简单,同时应该对不同的评价给予不同的权重。相似文献

8.

Want to See the Sites? Better Find a Better Guide

《国际互联网参考资料服务季刊》2013,18(3):85-96

ABSTRACT

This paper presents the results of a study of the utility of several popular search engines and of two newer search engines with respect to librarian-selected lists of Web resources and Internet searching behaviors. This study addresses whether said resources are returned where Internet searchers could reasonably be expected to find them and whether the search engines employed serve as acceptable substitutes for the expert advice of librarians. Search engines included in the study were Google, http://www.MSN.com, Yahoo, Lycos, AskJeeves, Icerocket, and Acoona. Searches for the study were based on the topics/titles of the “Internet Resources” columns from College & Research Libraries News for 2004. Finally, the paper addresses methodological concerns and proposes possible directions for further research. 相似文献

9.

搜索引擎页面排序融合算法

吴文昭《计算机工程与设计》2010,31(8)

针对PageRank算法不十分关注页面内容而只关注"超链分析"的现状,并存在着用户实际所需要的页面的次序并不靠前的问题,提出了一种搜索引擎页面排序融合算法.该算法通过考虑词项权重、链接分析和用户偏好3个主要方面,得到一个URL的权值评价,这样每个待搜集的网页都有自己的权值评价,超链选择程序根据这些权值,从中选出一个或一批权值最大的来搜集,以达到精确检索的目的. 相似文献

10.

求解多车型校车路径问题的带参数选择机制的GRASP算法

侯彦娥党兰学孔云峰谢毅《计算机科学》2016,43(8):233-239

考虑到校车路径安排过程中不同车型容量和成本的差异,建立了多车型校车路径问题(SBRP)模型,并提出了一种带参数选择机制的贪婪随机自适应(GRASP)算法进行求解。在初始解构造阶段,设计一组阈值参数控制受限候选列表(RCL)的大小,使用轮盘赌法选择阈值参数。完成初始解构造后,使用可变邻域搜索(VNS)进行邻域解改进,并记录所选择的参数和解的目标值。算法迭代过程中,先设置相同阈值参数的选择概率,每隔若干次迭代后,评估每个阈值参数的性能并修改其选择概率,使得算法能够得到更好的平均解。使用基准测试案例进行了测试,比较了基本GRASP算法与设计的GRASP算法的性能,并与现有求解多车型校车路径问题的算法进行对比,实验结果表明所设计的算法是有效的。相似文献