基于关键字密度的XML关键字检索 Study on Keyword Retrieval Based on Keyword Density for XML Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于关键字密度的XML关键字检索

引用本文：	覃遵跃,汤庸,徐洪智,黄云.基于关键字密度的XML关键字检索[J].软件学报,2019,30(4):1062-1077.

作者姓名：	覃遵跃汤庸徐洪智黄云

作者单位：	中山大学数据科学与计算机学院, 广东广州 510006;吉首大学软件学院, 湖南张家界 427000,中山大学数据科学与计算机学院, 广东广州 510006;华南师范大学计算机学院, 广东广州 510631,吉首大学软件学院, 湖南张家界 427000,吉首大学软件学院, 湖南张家界 427000

基金项目：	国家高技术研究发展计划（863）（2013AA01A212）；国家自然科学基金（61772211，60970044，61272067，61363073）；广东省自然科学基金团队研究项目（2014B010116002，2015B010109003，2013B090800024，S2012030006242，2015B010129009）

摘要：	关键字检索具有友好的用户操作体验，该检索方式已在文本信息检索领域得到了广泛而深入的应用.对XML数据采用关键字检索是目前研究的热点.基于查询语义的XML关键字检索方法存在返回大量与用户查询意图无关的查询片段或者丢失符合用户查询意图的片段这两个问题.针对这些问题，在考虑LCA横向和纵向两个维度的基础上，提出了用户查询意图与LCA相关性的两个规则，根据两个规则定义了LCA的边密度和路径密度，建立了综合的LCA节点评分公式，最后设计TopLCA-K算法对LCA进行排名，并利用中心位置索引CI提高了TopLCA-K算法的效率.实验结果显示，利用所提出的方法返回的查询节点更加符合用户需求.
关键词：	XML关键字检索边密度路径密度 TopLCA-K算法
收稿时间：	2016/7/22 0:00:00
修稿时间：	2017/6/9 0:00:00
Study on Keyword Retrieval Based on Keyword Density for XML Data

QIN Zun-Yue,TANG Yong,XU Hong-Zhi and HUANG Yun.Study on Keyword Retrieval Based on Keyword Density for XML Data[J].Journal of Software,2019,30(4):1062-1077.

Authors:	QIN Zun-Yue TANG Yong XU Hong-Zhi and HUANG Yun

Affiliation:	School of Data and Computer Science, SunYat-Sen University, Guangzhou 510275, China;School of Software, JiShou University, Zhangjiajie 427000, China,School of Data and Computer Science, SunYat-Sen University, Guangzhou 510275, China;School of Computer Science, South China Normal University, Guangzhou 510631, China,School of Software, JiShou University, Zhangjiajie 427000, China and School of Software, JiShou University, Zhangjiajie 427000, China

Abstract:	Keyword search has a friendly user experience; the method has been widely used in the field of text information retrieval. Keyword search on XML data is a hot research topic presently. The XML keyword search method based on query semantics have two problems:(1) a large number of query fragments which are not related to the user''s query intention have been returned; (2) the fragments which are consistent with the user''s query intention have been missed. Aiming at these problems, two rules of user query intention and LCA correlation are proposed on the basis of the two (horizontal and vertical) dimensions of LCA. The edge density and path density of LCA are defined according to the two rules, and a comprehensive scoring formula on LCA nodes is established, finally, the TopLCA-K algorithm is designed to rank LCA. To improve the efficiency of the algorithm, center location index is designed. Experimental results show that the nodes returned by this method are more in line with the needs of users.

Keywords:	XML keyword retrieval edge density path density TopLCA-K algorithm

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏