首页 | 官方网站   微博 | 高级检索  
     

基于聚簇的XML文档近似连接方法
引用本文:韩哲,王宏志,高宏,李建中,骆吉洲.基于聚簇的XML文档近似连接方法[J].计算机研究与发展,2009,46(Z2).
作者姓名:韩哲  王宏志  高宏  李建中  骆吉洲
作者单位:哈尔滨工业大学计算机学院,哈尔滨,150001
基金项目:国家"九七三"重点基础研究发展计划基金项目,国家自然科学基金重点项目,国家自然科学基金项目,黑龙江省青年科技专项资金项目,国家"八六三"高技术研究发展计划基金项目,NSFC/RGC联合科研基金项目 
摘    要:XML文档近似连接操作是在两个XML文档集合中发现近似的XML文档,其在基于XML数据的信息集成、XML数据清洗等系统中有着广泛的应用.然而,目前XML文档近似连接操作的一个显著问题在于:当文档之间存在较大差异时,存在大量的重复计算,降低了处理效率.对于这个问题,提出了基于聚类的XML文档近似连接方法,基本思想是为每个XML文档建立一个索引,如果两个数据集中若干文档的索引较相似,可以把它们组成一簇,然后在每一簇中执行近似连接.而不在任何簇中的文档,则无需对其进行任何计算.实验结果表明,提出的方法在保证正确率的前提下具有高效性.

关 键 词:近似连接  聚簇

Clustering-Based Approximate Join Method on XML Documents
Han Zhe,Wang Hongzhi,Gao Hong,Li Jianzhong,Luo Jizhou.Clustering-Based Approximate Join Method on XML Documents[J].Journal of Computer Research and Development,2009,46(Z2).
Authors:Han Zhe  Wang Hongzhi  Gao Hong  Li Jianzhong  Luo Jizhou
Abstract:Approximate join on XML document sets is to find the similar documents between two document sets.It is widely used in many applications,such as XML-based data integration,XML data cleaning.Currently,some techniques have been proposed to perform approximate join on XML documents.However,a significant problem in current techniques is that the large differences among the documents result in many redundant computations.In this paper,a clustering-based approximate join method on XML documents is presented.Firstly,an index is built for each XML document tree.If the indecies of some documents are similar,these document trees are clustered into one cluster.Then the approximate join is performed on the docments in each cluster.If a tree is not in any cluster,the computation of its windowed pq-grams will be avoided.Experimental results show that the method in this paper outperforms existing algorithms in efficiency without accuracy loss.
Keywords:XML  XML  approximate join  cluster
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号