首页 | 官方网站   微博 | 高级检索  
     

基于关键字密度的XML关键字检索
引用本文:覃遵跃,汤庸,徐洪智,黄云.基于关键字密度的XML关键字检索[J].软件学报,2019,30(4):1062-1077.
作者姓名:覃遵跃  汤庸  徐洪智  黄云
作者单位:中山大学 数据科学与计算机学院, 广东 广州 510006;吉首大学 软件学院, 湖南 张家界 427000,中山大学 数据科学与计算机学院, 广东 广州 510006;华南师范大学 计算机学院, 广东 广州 510631,吉首大学 软件学院, 湖南 张家界 427000,吉首大学 软件学院, 湖南 张家界 427000
基金项目:国家高技术研究发展计划(863)(2013AA01A212);国家自然科学基金(61772211,60970044,61272067,61363073);广东省自然科学基金团队研究项目(2014B010116002,2015B010109003,2013B090800024,S2012030006242,2015B010129009)
摘    要:关键字检索具有友好的用户操作体验,该检索方式已在文本信息检索领域得到了广泛而深入的应用.对XML数据采用关键字检索是目前研究的热点.基于查询语义的XML关键字检索方法存在返回大量与用户查询意图无关的查询片段或者丢失符合用户查询意图的片段这两个问题.针对这些问题,在考虑LCA横向和纵向两个维度的基础上,提出了用户查询意图与LCA相关性的两个规则,根据两个规则定义了LCA的边密度和路径密度,建立了综合的LCA节点评分公式,最后设计TopLCA-K算法对LCA进行排名,并利用中心位置索引CI提高了TopLCA-K算法的效率.实验结果显示,利用所提出的方法返回的查询节点更加符合用户需求.

关 键 词:XML关键字检索  边密度  路径密度  TopLCA-K算法
收稿时间:2016/7/22 0:00:00
修稿时间:2017/6/9 0:00:00

Study on Keyword Retrieval Based on Keyword Density for XML Data
QIN Zun-Yue,TANG Yong,XU Hong-Zhi and HUANG Yun.Study on Keyword Retrieval Based on Keyword Density for XML Data[J].Journal of Software,2019,30(4):1062-1077.
Authors:QIN Zun-Yue  TANG Yong  XU Hong-Zhi and HUANG Yun
Affiliation:School of Data and Computer Science, SunYat-Sen University, Guangzhou 510275, China;School of Software, JiShou University, Zhangjiajie 427000, China,School of Data and Computer Science, SunYat-Sen University, Guangzhou 510275, China;School of Computer Science, South China Normal University, Guangzhou 510631, China,School of Software, JiShou University, Zhangjiajie 427000, China and School of Software, JiShou University, Zhangjiajie 427000, China
Abstract:Keyword search has a friendly user experience; the method has been widely used in the field of text information retrieval. Keyword search on XML data is a hot research topic presently. The XML keyword search method based on query semantics have two problems:(1) a large number of query fragments which are not related to the user''s query intention have been returned; (2) the fragments which are consistent with the user''s query intention have been missed. Aiming at these problems, two rules of user query intention and LCA correlation are proposed on the basis of the two (horizontal and vertical) dimensions of LCA. The edge density and path density of LCA are defined according to the two rules, and a comprehensive scoring formula on LCA nodes is established, finally, the TopLCA-K algorithm is designed to rank LCA. To improve the efficiency of the algorithm, center location index is designed. Experimental results show that the nodes returned by this method are more in line with the needs of users.
Keywords:XML keyword retrieval  edge density  path density  TopLCA-K algorithm
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号