基于用户查询与样本间匹配度评估的分层抽样策略 A STRATIFIED SAMPLING APPROACH BASED ON MATCHING DEGREE EVALUATION BETWEEN USER QUERY AND SAMPLE SET期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于用户查询与样本间匹配度评估的分层抽样策略

引用本文：	邬志罡,荆一楠,何震瀛,王晓阳.基于用户查询与样本间匹配度评估的分层抽样策略[J].计算机应用与软件,2019,36(8):196-202.

作者姓名：	邬志罡荆一楠何震瀛王晓阳

作者单位：	复旦大学计算机科学技术学院上海 201203;上海市数据科学重点实验室(复旦大学) 上海 200433;复旦大学计算机科学技术学院上海 201203;上海市数据科学重点实验室(复旦大学) 上海 200433;上海智能电子与系统研究院上海 200433

基金项目：	国家自然科学基金;国家重点研发计划;上海市科技创新行动计划

摘要：	在数据探索性分析场景下,用户倾向于借助抽样系统获取近似查询结果来换取更快的查询速度。现有的抽样系统通常假设用户的历史查询记录能很好地表征未来的查询情况,从而针对特定的查询特征生成特定的抽样策略。然而,在现实场景中,用户探索意图变化丰富,用户查询特征的稳定性假设通常无法得到保证。为解决上述问题,提出一种评估任意用户查询与样本间匹配度的方法。离线训练生成多份样本集,并在应对具体查询时自动选取最匹配样本集进行近似结果计算。离线样本集的生成是以在所有可能的用户查询上的预期匹配度损失总和最小作为训练目标。实验结果表明,在真实数据集上,该抽样系统与现有方法相比,将近似结果的精确度提高了26.3%。
关键词：	抽样系统近似查询处理分层抽样优化问题
A STRATIFIED SAMPLING APPROACH BASED ON MATCHING DEGREE EVALUATION BETWEEN USER QUERY AND SAMPLE SET

Wu Zhigang,Jing Yi’nan,He Zhenying,Wang Xiaoyang.A STRATIFIED SAMPLING APPROACH BASED ON MATCHING DEGREE EVALUATION BETWEEN USER QUERY AND SAMPLE SET[J].Computer Applications and Software,2019,36(8):196-202.

Authors:	Wu Zhigang Jing Yi’nan He Zhenying Wang Xiaoyang

Affiliation:	(School of Computer Science,Fudan University,Shanghai 201203,China;Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 200433,China;Shanghai Institute of Intelligent Electronics and Systems,Shanghai 200433,China)

Abstract:	During the data exploration tasks,users usually prefer to use sampling system for getting an approximate answer rather than suffer from high query latency.Existing sampling systems usually make hypothesis that the historical user query workload can represent the pattern of future user queries very closely.Based on this hypothesis,they specifically design sampling strategy for specific user query pattern.However,in the real use case,the users exploration intentions are always changing,so the hypothesis of the stability of the user query pattern cannot be guaranteed.To solve these problems,this paper proposed a method to evaluate the matching degree between any user query and the sample set.The system generated multiple offline sample sets.When a particular user query came,the system could automatically choose the best matching sample set and calculate the approximate query answer.The offline sample sets were trained so that the expected total sum of the matching degree losses upon all possible user queries became the lowest.The experimental results show that,compared with the existing methods,the accuracy of the approximate results is improved by 26.3% on the real data set.

Keywords:	Sampling system Approximate query processing Stratified sampling Optimization problem
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏