期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document

Hongzhi Wang Jianzhong Li Wei Wang Xuemin Lin 《World Wide Web》2008,11(4):485-510

In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063. 相似文献

2.

一种支持更新的有序XML文档编码方法

朱长城梁平元《计算机工程与应用》2012,48(25):141-145

摘要：在XML查询中,为了快速判断节点之间祖先后裔关系和双亲孩子关系,提出了一些编码方案,然而,当对顺序敏感的有序XML文档进行插入更新操作的时候,现有的编码方案必须重新对存在的节点进行编码或者重新计算节点编码的值,导致了很高的更新代价。在路径编码方法的基础上,提出了一种新的编码方案BSEPS（Binary String Encoding based on Path Scheme）,该方案支持在不重新编码或者重新计算的情况下完成顺序敏感的插入更新操作。实验结果表明,BSEPS编码能有效处理顺序敏感查询和叶节点/子树更新。相似文献

3.

Native XML数据库的文档编码机制研究

下载免费PDF全文

张鹏冯建华韩秀峰《计算机工程与应用》2008,44(12):147-150

Native XML数据库快速查询的实现,可以采用基于XML文档编码的结构连接算法。而结构连接算法的实现需要对XML文档进行编码,以便于快速判断XML文档树结点之间的祖先后裔关系。在对现有编码机制进行综述的前提下,提出一种新的XML文档编码机制——前缀整除编码（PDIV）机制。该机制编码形式简单,只需要一个正整数即可充分表示结点在XML文档树中的位置信息;可以实现祖先后裔关系的快速查询;支持XML文档的更新操作;编码长度较短,编码长度约为o（ln（n））。相似文献

4.

有效支持XML结构化连接的索引——CATI 总被引：1，自引：0，他引：1

于亚新王国仁张海宁李建新《计算机研究与发展》2007,44(1):111-118

结构化连接的效率直接影响着XML查询的性能,目前对XML的结构化连接大多都是基于编码的方法.介绍了一种全新的有效支持XML结构化连接的树索引CATI(compact ancestor tree index)CATI的基本思想是,对于给定的一个祖先后代查询(A-D查询)或Twig查询,遍历XML文档,找出所有的祖先A的实例,用以建立CATI的主干;对于每个A实例,找出它的直接后代D的实例链接在它的后面.因为经典的结构连接算法Stack-Tree算法效率较高且使用较广,因此应用基于CATI的结构连接算法和基于Stack-Tree的结构连接算法就A-D查询和Twig查询做了大量实验.实验结果表明,基于CATI的结构化连接在一般查询情况下性能明显优于基于Stack-Tree的结构化连接. 相似文献

5.

An encoding scheme based on fractional number for querying and updating XML data

Meghdad Mirabi Hamidah Ibrahim Nur Izura Udzir Ali Mamat 《Journal of Systems and Software》2012,85(8):1831-1851

In order to facilitate the XML query processing, several labeling schemes have been proposed to directly determine the structural relationships between two arbitrary XML nodes without accessing the original XML documents. However, the existing XML labeling schemes have to re-label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during an update process. In this paper, we devise a novel encoding scheme based on the fractional number to encode the labels of the XML nodes. Moreover, we propose a mapping method to convert our proposed fractional number based encoding scheme to bit string based encoding scheme with the intention to minimize the label size and save the storage space. By applying our proposed bit string encoding scheme to the range-based labeling scheme and the prefix labeling scheme, the process of re-labeling the pre-existing nodes can be avoided when nodes are inserted as leaf nodes and sibling nodes without affecting the order of XML nodes. In addition, we propose an algorithm to control the increment of label size when new nodes are inserted frequently at a fix place of an XML tree. Experimental results show that our proposed bit string encoding scheme provides efficient support to the process of XML updating without sacrificing the query performance when it is applied to the range-based labeling schemes. 相似文献

6.

BSC：一种高效的动态XML树编码方案 总被引：1，自引：0，他引：1

汪陈应袁晓洁王鑫刘众奇《计算机科学》2008,35(3):76-78

确定一篇XML文档中任意两个节点之间是否存在某种结构关系,是XML查询处理过程的一个重要组成部分.XML树编码方案为每个节点分配唯一编号,仅通过比较节点编号而不必访问原XML文档,就可以快速有效地确定节点间的结构关系.随着XML应用不断普及,能否高效地支持更新操作,已成为XML树编码方案研究的一个重要课题.本文基于二进制小数的特性,提出了一种新的XML树编码方案-BSC,它可以完全高效地支持XML更新操作而不需要重新编码.实验结果证明,与已有的动态编码方案相比,BSC编码无论在静态编码方面还是在动态更新方面都具有很好的性能. 相似文献

7.

高效查询的XML编码方案 总被引：1，自引：0，他引：1

文华南刘先锋李文锋李玲勇《计算机应用》2010,30(3):831-834

在XML数据查询中,结构连接操作占用了大量时间。针对这个问题,提出一种高效查询的编码方案—LSEQ编码。它将节点路径信息进行分解,避免记录路径的重复信息,减小了编码长度;同时支持节点祖先后代关系,父子关系和兄弟关系的表示。LSEQ编码通过记录非叶节点的路径,在节点查询中避免了结构连接操作,提高了查询效率。实验表明LSEQ编码提高了空间利用率,在查询速度上具有出良好的性能。相似文献

8.

一种新的基于区域的动态编码方案

下载免费PDF全文

任家东尹晓鹏《计算机工程》2006,32(18):79-80,8

为了提高查询效率，许多XML文档编码方案相继被提出。目前大部分编码方案并不能很好地支持文档更新。在分析比较现有编码方案的基础上，提出了一种新的动态编码方案(DNS)。该方案用实数表示XML文档树中的节点编码，能够利用连续数值间的区域为新插入的节点或子树编码，并能够根据文档的更新情况动态调整部分节点的编码。相似文献

9.

Indexing and querying XML using extended Dewey labeling scheme 总被引：1，自引：0，他引：1

Jiaheng LuAuthor Vitae Xiaofeng MengAuthor VitaeTok Wang LingAuthor Vitae 《Data & Knowledge Engineering》2011,70(1):35-59

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance. 相似文献

10.

CSBTT:一种基于二叉树遍历的XML文档编码模式

万里勇陈颖《计算机系统应用》2013,22(2):151-154

XML文档数据编码模式是XML文档查询处理的基础, 好的文档编码模式有利于提高文档的查询效率. 为了解决XML数据查询效率低、支持动态更新等问题. 本文在二叉树遍历的编码基础上, 引入二叉树的三叉链表存储结构对XML文档结点进行编码. 该编码利用自然数作为编码序号, 因此编码长度较短; 引入结点双亲指针, 方便结点之间结构关系的判定, 结点采用三叉树链式存储, 方便文档的更新操作. 相似文献

11.

Compact Labeling Scheme for XML Ancestor Queries

Haim Kaplan Tova Milo Ronen Shabo 《Theory of Computing Systems》2007,40(1):55-99

XML documents are often viewed as trees (basically the parse tree of the document), and queries over such documents typically test for ancestor relationships among tree nodes. Search engines process such queries using an index structure summarizing the ancestor relations. In the index, each document item (tree node) is identified using some logical id (node label), such that, given two labels, the engine can determine the ancestor relationship between the corresponding nodes. The length of the labels is a main factor of the index size. Therefore, reducing this length, even by a constant factor, is a critical issue. In this work we consider the following problem. Given a rooted XML tree T, label the nodes of T in the most compact way such that given the labels of two nodes, one can determine in constant time, by looking at the labels only, whether one node is an ancestor of the other. Labelings currently being used are all variants of the following interval scheme. Number the leaves say from left to right and label each node with a pair consisting of the numbers of its smallest and largest leaf descendants. An ancestor query then amounts to an interval containment test on the labels. The maximum label length using this scheme is 2 log n, where n is the number of nodes in the tree. (All logarithms in this paper are to base 2.) The focus of this work is finding a scheme that works best in practice on real XML data. We suggest an orthogonal prefix-based approach, where the labeling is such that an ancestor query roughly amounts to testing whether one label is a prefix of the other. We present several new labeling schemes based on this approach and analyze their performance both theoretically and empirically. 相似文献

12.

SLS: A numbering scheme for large XML documents

N. A. Aznauryan S. D. Kuznetsov L. G. Novak M. N. Grinev 《Programming and Computer Software》2006,32(1):8-18

In view of the efficiency requirements for query and update processing in XML databases, implementation of the robust node labeling (numbering) scheme becomes an increasingly important research issue. In order to process XML queries efficiently, it is necessary to detect the ancestor-descendant relationship between the nodes and restore the sequence order of nodes in the document. To solve this problem, the technique of labeling the document nodes is used. As a result, the so-called numbering scheme is created. The nodes of the documents are labeled with certain unique identifiers. Comparing these identifiers, one can restore the sequence order of the nodes and to establish the hierarchical relationships. In this paper, we give a survey of the most efficient numbering schemes and introduce a numbering scheme proposed by the authors and employed in the Sedna DBMS [1]. 相似文献

13.

改进的基于小枝模式的匹配算法——cTwigStack

姚全珠郭祯房美君《计算机应用》2011,31(10):2782-2785

给定一个小枝模式查询,如何快速地在XML数据集中找到所有感兴趣的信息,已成为当前研究的热点。针对TwigStack算法在处理含有父子节点的情况下会产生大量的中间结果等问题,通过栈来对非叶子节点缓存和对叶子节点延迟输出的思想,提出了一种改进的小枝模式匹配算法--cTwigStack。采用Treebank数据集进行测验,结果表明该算法不仅仅在处理祖孙/后继节点时能使输出结果的准确性达到最优,而且在处理父子节点时,相对目前提出的算法,也是非常高效的。相似文献

14.

基于复杂模式索引的XML查询优化*

于红王秀坤高艳萍张建英杨南海《计算机应用研究》2007,24(8):100-105

分析了XML模式与XML文档之间的关系以及XML查询的特点,提出了一种基于复杂模式索引的XML查询优化方法.该方法对XML模式中的节点建立索引,查询时考虑XML模式中带有环的情况.首先对查询树进行去除重复元素的预处理,并将查询树分解成主路径和分支路径;然后利用索引查找潜在目标节点的XML模式编号;最后在XML文档中对对应节点进行筛选,找到目标节点.该方法可以减少连接操作的次数,提高查询操作的效率,能处理较复杂的XML模式. 相似文献

15.

An efficient XML encoding and labeling method for query processing and updating on dynamic XML data

Jun-Ki Min Author Vitae 《Journal of Systems and Software》2009,82(3):503-515

In this paper, we propose an efficient encoding and labeling scheme for XML, called EXEL, which is a variant of the region labeling scheme using ordinal and insert-friendly bit strings. We devise a binary encoding method to generate the ordinal bit strings, and an algorithm to make a new bit string inserted between bit strings without any influences on the order of preexisting bit strings. These binary encoding method and bit string insertion algorithm are the bases of the efficient query processing and the complete avoidance of re-labeling for updates. We present query processing and update processing methods based on EXEL. In addition, the Stack-Tree-Desc algorithm is used for an efficient structural join, and the String B-tree indexing is utilized to improve the join performance. Finally, the experimental results show that EXEL enables complete avoidance of re-labeling for updates while providing fairly reasonable query processing performance. 相似文献

16.

XML数据扩展前序编码的更新方法 总被引：15，自引：0，他引：15

罗道锋孟小峰蒋瑜《软件学报》2005,16(5):810-818

大部分XML查询技术都是基于某种对XML树的编码方法.对XML树的编码,是指按照某种规则对XML树的每一个结点分配唯一的编码,目的是通过任意两个结点的编码,能够直接判断两个结点之间是否具有祖先后代关系.最常用的编码方法是区域编码方法(region based numbering scheme).然而,XML数据也会面临插入删除等更新问题.数据一旦更新,区域编码也要作相应的调整,才能保证基于这个编码的各种索引和查询算法的正确性.在编码的更新方面,目前研究得还不多.主要研究区域编码的更新问题,采用预留编码空间的方法,针对不同特征的XML数据和应用环境提出了一整套预留算法和编码更新算法,并做了大量的实验,检验这些算法的有效性. 相似文献

17.

Structural Join and Staircase Join Algorithms of Sibling Relationship

下载免费PDF全文

Chang-Xuan Wan Xi-Ping Liu 《计算机科学技术学报》2007,22(2):171-181

The processing of XML queries can result in evaluation of various structural relationships. Efficient algorithms for evaluating ancestor-descendant and parent-child relationships have been proposed. Whereas the problems of evaluating preceding-sibling-following-sibling and preceding-following relationships are still open. In this paper, we studied the structural join and staircase join for sibling relationship. First, the idea of how to filter out and minimize unnecessary reads of elements using parent＇s structural information is introduced, which can be used to accelerate structural joins of parent-child and preceding-sibling-following-sibling relationships. Second, two efficient structural join algorithms of sibling relationship are proposed. These algorithms lead to optimal join performance： nodes that do not participate in the join can be judged beforehand and then skipped using B^＋-tree index. Besides, each element list joined is scanned sequentially once at most. Furthermore, output of join results is sorted in document order. We also discussed the staircase join algorithm for sibling axes. Studies show that, staircase join for sibling axes is close to the structural join for sibling axes and shares the same characteristic of high efficiency. Our experimental results not only demonstrate the effectiveness of our optimizing techniques for sibling axes, but also validate the efficiency of our algorithms. As far as we know, this is the first work addressing this problem specially. 相似文献

18.

一种基于有序对的小枝模式匹配算法

王瑞陶世群《计算机研究与发展》2009,46(Z2)

随着半结构化的数据在信息交换中越来越重要,近年来,在XML数据库中,研究工作者提出了很多匹配小枝查询的算法.这些算法对仅含祖先后裔边的查询是很有效的,但是当查询中同时含祖先后裔和父子边时,以前算法仍可能产生大量中间结果,尤其是输入和输出的规模很大时.为避免中间结果的产生,提出了一种新的算法OPTwig,它是基于有序对的,通过查询树和文档树中结点有序对的匹配来进行查询,且不需要进行归并操作.结果表明,该算法优于以前算法. 相似文献

19.

支持XML插入更新的编码方法

覃遵跃黄云蔡国民梁平元《计算机应用》2012,32(12):3540-3543

对有序XML文档进行编码,可以不用访问XML数据文件就能够实现对XML数据的处理。目前提出的编码方案在支持查询XML数据方面取得了较好的效果,针对已有编码方案在插入更新过程中存在查询性能或者更新性能偏低等问题,提出了一种新的支持插入更新的编码方案——EDL,EDL对前缀编码进行了扩展,利用数值表达节点的初始顺序关系,利用二进制字符串(BS)来支持更新计算。EDL在未降低查询性能的前提下,完全避免了插入更新后对其他节点进行重新编码,高效实现了XML文档的插入更新计算。实验结果表明EDL优于同类型的支持更新的编码方案。相似文献

20.

一种最优的静态路径编码存储策略

陈子阳周军锋《计算机研究与发展》2011,48(6)

路径编码方案通过记录从XML文档根结点到当前结点的路径信息,可以快速判断结点间的各种位置关系.高效的编码存储策略可以在提高存储空间利用率的同时,减少系统的IO开销,从而进一步提升系统的整体性能.提出一种最优的静态路径编码存储策略,其基本思想是在存储编码中的数字时,每个编码中数字对应的前缀并非提前给定,而是根据其所在数字区间中数字的使用频率之和给定相应的前缀,因此可以充分利用每个不同数字的频率信息来降低所需的存储空间.最后通过实验结果验证了该方法的可行性及有效性. 相似文献