文本搜索排序中构造训练集的一种方法 Construct Training Set for Learning to Rank in Web Search期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

文本搜索排序中构造训练集的一种方法

引用本文：	王黎,帅建梅.文本搜索排序中构造训练集的一种方法[J].计算机系统应用,2010,19(10):199-202.

作者姓名：	王黎帅建梅

作者单位：	中国科学技术大学,自动化系,安徽,合肥,230027

基金项目：	国家高技术研究发展计划(863)(2006AA01Z449)

摘要：	在文本搜索领域，用自学习排序的方法构建排序模型越来越普遍。排序模型的性能很大程度上依赖训练集。每个训练样本需要人工标注文档与给定查询的相关程度。对于文本搜索而言，查询几乎是无穷的，而人工标注耗时费力，所以选择部分有信息量的查询来标注很有意义。提出一种同时考虑查询的难度、密度和多样性的贪心算法从海量的查询中选择有信息量的查询进行标注。在LETOR和从Web搜索引擎数据库上的实验结果，证明利用本文提出的方法能构造一个规模较小且有效的训练集。
关键词：	信息检索自学习排序构造训练集
收稿时间：	2010/1/18 0:00:00
修稿时间：	2010/2/26 0:00:00
Construct Training Set for Learning to Rank in Web Search

WANG Li and SHUAI Jian-Mei.Construct Training Set for Learning to Rank in Web Search[J].Computer Systems& Applications,2010,19(10):199-202.

Authors:	WANG Li and SHUAI Jian-Mei

Affiliation:	(Department of Automation, University of Science and Technology of China, Hefei 230027, China)

Abstract:	Learning to rank has become a popular method to build a ranking model for Web search. For the same ranking algorithm, the performance of ranking model depends on a training set. A training sample is constructed by labeling the relevance of a document and a given query by a human. However, the number of queries in Web search is nearly infinite, and the human labeling cost is expensive. Therefore, it is necessary to select a subset of queries to construct an efficient training set. In this paper, a algorithm is developed to select queries by simultaneously taking the query difficulty, density, and diversity into consideration. The experimental results on LETOR and a collected Web search dataset show that the proposed method can lead to a more efficient training set.

Keywords:	information retrieval learning to rank construct training set
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏