首页 | 官方网站   微博 | 高级检索  
     

文本搜索排序中构造训练集的一种方法
引用本文:王黎,帅建梅.文本搜索排序中构造训练集的一种方法[J].计算机系统应用,2010,19(10):199-202.
作者姓名:王黎  帅建梅
作者单位:中国科学技术大学,自动化系,安徽,合肥,230027
基金项目:国家高技术研究发展计划(863)(2006AA01Z449)
摘    要:在文本搜索领域,用自学习排序的方法构建排序模型越来越普遍。排序模型的性能很大程度上依赖训练集。每个训练样本需要人工标注文档与给定查询的相关程度。对于文本搜索而言,查询几乎是无穷的,而人工标注耗时费力,所以选择部分有信息量的查询来标注很有意义。提出一种同时考虑查询的难度、密度和多样性的贪心算法从海量的查询中选择有信息量的查询进行标注。在LETOR和从Web搜索引擎数据库上的实验结果,证明利用本文提出的方法能构造一个规模较小且有效的训练集。

关 键 词:信息检索  自学习排序  构造训练集
收稿时间:2010/1/18 0:00:00
修稿时间:2010/2/26 0:00:00

Construct Training Set for Learning to Rank in Web Search
WANG Li and SHUAI Jian-Mei.Construct Training Set for Learning to Rank in Web Search[J].Computer Systems& Applications,2010,19(10):199-202.
Authors:WANG Li and SHUAI Jian-Mei
Affiliation:(Department of Automation, University of Science and Technology of China, Hefei 230027, China)
Abstract:Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.
Keywords:information retrieval  learning to rank  construct training set
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号