期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Computer Networks》1999,31(11-16):1155-1169

An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation, and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative, `relational complete' query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents. 相似文献

2.

Benchmark frameworks and τBench

Stephen W. Thomas Richard T. Snodgrass Rui Zhang 《Software》2014,44(9):1047-1075

Software engineering frameworks tame the complexity of large collections of classes by identifying structural invariants, regularizing interfaces, and increasing sharing across the collection. We wish to appropriate these benefits for families of closely related benchmarks, say for evaluating query engine implementation strategies. We introduce the notion of a benchmark framework, an ecosystem of benchmarks that are related in semantically rich ways and enabled by organizing principles. A benchmark framework is realized by iteratively changing one individual benchmark into another, say by modifying the data format, adding schema constraints, or instantiating a different workload. Paramount to our notion of benchmark frameworks are the ease of describing the differences between individual benchmarks and the utility of methods to validate the correctness of each benchmark component by exploiting the overarching ecosystem. As a detailed case study, we introduce τBench, a benchmark framework consisting of ten individual benchmarks, spanning XML, XQuery, XML Schema, and PSM, along with temporal extensions to each. The second case study examines the Mining Unstructured Data benchmark framework, and the third examines the potential benefits of rendering the TPC family as a benchmark framework. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

3.

Using a relational database for scalable XML search

Rebecca J. Cathey Steven M. Beitzel Eric C. Jensen David Grossman Ophir Frieder 《The Journal of supercomputing》2008,44(2):146-178

XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach. We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.

Ophir Frieder (Corresponding author)Email:

相似文献

4.

基于TPC-C的XML数据库测试方案

下载免费PDF全文

司冠南许静《计算机工程》2010,36(19):37-38

为实现XML数据库的性能评测,提出基于TPC-C的XML数据库测试方案。针对XML数据库特性,对其数据结构、查询事务语句进行定制,将原有9张表映射成5个XML Schema文件,按照SQL/XML标准重写负载事务。应用该方案对SQL Server 2005数据库进行测试,结果表明显示的各项事务特征均与TPC-C基准相同。相似文献

5.

Partitioning methods for multi-version XML data warehouses

Laura Irina Rusu Wenny Rahayu David Taniar 《Distributed and Parallel Databases》2009,25(1-2):47-69

Due to an explosive increase of XML documents, it is imperative to manage XML data in an XML data warehouse. XML warehousing imposes challenges, which are not found in the relational data warehouses. In this paper, we firstly present a framework to build an XML data warehouse schema. For the purpose of scalability due to the increase of data volume, we propose a number of partitioning techniques for multi-version XML data warehouses, including document based partitioning, schema based partitioning, and cascaded (mixed) partitioning model. Finally, we formulate cost models to evaluate various types of queries for an XML data warehouse. 相似文献

6.

Consistent data for inconsistent XML document

《Information and Software Technology》2007,49(9-10):947-959

XML document may contain inconsistencies that violate predefined integrity constraints, which causes the data inconsistency problem. In this paper, we consider how to get the consistent data from an inconsistent XML document. There are two basic concepts for this problem: Repair is the data consistent with the integrity constraints, and also minimally differs from the original one. Consistent data is the data common for every possible repair. First we give a general constraint model for XML, which can express the commonly discussed integrity constraints, including functional dependencies, keys and multivalued dependencies. Next we provide a repair framework for inconsistent XML document with three basic update operations: node insertion, node deletion and node value modification. Following this approach, we introduce the concept of repair for inconsistent XML document, discuss the chase method to generate repairs, and prove some important properties of the chase. Finally we give a method to obtain the greatest lower bound of all possible repairs, which is sufficient for consistent data. We also implement prototypes of our method, and evaluate our framework and algorithms in the experiment. 相似文献

7.

Analysis of imperative XML programs

Christoph Reichenbach Michael G. Burke Igor Peshansky Mukund Raghavachari 《Information Systems》2009

The widespread adoption of XML has led to programming languages that support XML as a first class construct. In this paper, we present a method for analyzing and optimizing imperative XML processing programs. In particular, we present a program analysis, based on a flow-sensitive type system, for detecting both redundant computations and redundant traversals in such programs. The analysis handles imperative loops that traverse XML values explicitly and declarative queries over XML data in a uniform framework. We describe two optimizations that take advantage of our analysis: one merges queries that traverse the same set of XML nodes, and the other replaces an XPath expression by a previously computed result. We demonstrate performance improvements for selected XMark benchmark queries and XLinq sample queries. 相似文献

8.

Integrating and querying distributed XML data via XLink

Wolfgang May Erik Behrends Oliver Fritzen 《Information Systems》2008

XML instances are not necessarily self-contained but may have connections to remote XML data residing on other servers. In this paper, we show that—in spite of its minor support and use in the XML world—the XLink language provides a powerful mechanism for expressing such links both from the modeling point of view and for actually querying interlinked XML data: in our dbxlink approach, the links are not seen as explicit links (where the users must be aware of the links and traverse them explicitly in their queries), but define views that combine into a logical, transparent XML model which serves as an external schema and can be queried by XPath/XQuery. We motivate the underlying modeling and give a concise and declarative specification as an XML-to-XML mapping. We also describe the implementation of the model as an extension of the eXist [eXist: an Open Source Native XML Database, http://exist-db.org/] XML database system. The approach can be applied both for distribution of data and for integration of data from autonomous sources. 相似文献

9.

Propagating XML constraints to relations

《Journal of Computer and System Sciences》2007,73(3):316-361

We present a technique for refining the design of relational storage for XML data. The technique is based on XML key propagation: given a set of keys on XML data and a mapping (transformation) from the XML data to relations, what functional dependencies must hold on the relations produced by the mapping? With the functional dependencies one can then convert the relational design into, e.g. 3NF, BCNF, and thus develop efficient relational storage for XML data. We provide several algorithms for computing XML key propagation. One algorithm is to check whether a functional dependency is propagated from a set of XML keys via a predefined mapping; this allows one to determine whether or not the relational design is in a normal form. The others are to compute a minimum cover for all functional dependencies that are propagated from a set of XML keys and hold on a universal relation; these provide guidance for how to design a relational schema for storing XML data. These algorithms show that XML key propagation and its associated minimum cover can be computed in polynomial time. Our experimental results verify that these algorithms are efficient in practice. We also investigate the complexity of propagating other XML constraints to relations. The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level. 相似文献

10.

XML stream transformer generation through program composition and dependency analysis

《Science of Computer Programming》2005,54(2-3):257-290

XML stream transformation, which sequentially processes the input XML data on the fly, makes it possible to process large sized data within a limited amount of memory. Though being efficient in memory-use, stream transformation requires stateful programming, which is error-prone and hard to manage.This paper proposes a scheme for generating XML stream transformers. Given an attribute grammar definition of transformation over an XML tree structure, we systematically derive a stream transformer in two steps. First, an attribute grammar definition of the XML stream transformation is inferred by applying a program composition method. Second, a finite state transition machine is constructed through a dependency analysis. Due to the closure property of the program composition method, our scheme also allows modular construction of XML stream transformers.We have implemented a prototype XML stream transformer generator, called altSAX. The experimental results show that the generated transformers are efficient in memory consumption as well as in execution time. 相似文献

11.

An indexing method for wireless broadcast XML data

《Information Sciences》2007,177(9):1931-1953

The paper considers a wireless information system, wherein various pieces of information represented in XML are broadcast via wireless channels, and mobile clients access the broadcast stream using energy-restricted portable devices.In this paper, we propose a wireless XML streaming method designed to provide energy-efficient access to a wireless stream. We construct two hierarchical structures to represent the XML data and their index information, called the XML data tree and XML index tree, respectively. The wireless XML stream is generated by traversing these two structures with some replications. We design three data/index replication strategies (PP, TT, and TP) in the streaming method. We compare the proposed streaming method with a naı¨ve method called the (1, X) method both analytically and experimentally. Also, based on our analysis results, we determine the optimal method of replication. 相似文献

12.

Efficient processing of top-k twig queries over probabilistic XML data

Bo Ning Chengfei Liu Jeffrey Xu Yu 《World Wide Web》2013,16(3):299-323

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms. 相似文献

13.

Indexing and querying XML using extended Dewey labeling scheme 总被引：1，自引：0，他引：1

Jiaheng LuAuthor Vitae Xiaofeng MengAuthor VitaeTok Wang LingAuthor Vitae 《Data & Knowledge Engineering》2011,70(1):35-59

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance. 相似文献

14.

Adaptive relaxation for querying heterogeneous XML data sources

Chengfei Liu Jianxin Li Jeffrey Xu Yu Rui Zhou 《Information Systems》2010

Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach. 相似文献

15.

XML compression techniques: A survey and comparison

Sherif Sakr 《Journal of Computer and System Sciences》2009,75(5):303-322

XML has been acknowledged as the defacto standard for data representation and exchange over the World Wide Web. Being self describing grants XML its great flexibility and wide acceptance but on the other hand it is the cause of its main drawback that of being huge in size. The huge document size means that the amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Several XML compression techniques has been introduced to deal with these problems.In this paper, we provide a complete survey over the state-of-the-art of XML compression techniques. In addition, we present an extensive experimental study of the available implementations of these techniques. We report the behavior of nine XML compressors using a large corpus of XML documents which covers the different natures and scales of XML documents. In addition to assessing and comparing the performance characteristics of the evaluated XML compression tools, the study also tries to assess the effectiveness and practicality of using these tools in the real world. Finally, we provide some guidelines and recommendations which are useful for helping developers and users for making an effective decision towards selecting the most suitable XML compression tool for their needs. 相似文献

16.

XML structural delta mining: Issues and challenges

《Data & Knowledge Engineering》2007,60(3):627-651

Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (also called structural delta) of history of XML documents. Unlike existing approaches, XML structural delta mining focuses on the dynamic and temporal features of XML data. Furthermore, the data source for this novel mining technique is a sequence of historical versions of an XML document rather than a set of snapshot XML documents. Such mining technique can be useful in many applications such as change detection for very large XML documents, efficient XML indexing, XML search engine, etc. Our aim in this paper is not to provide a specific solution to a particular mining problem. Rather, we present the vision of the mining framework and present the issues and challenges for three types of XML structural delta mining: identifying various interesting structures, discovering association rules from structural deltas, and structural change pattern-based classification. 相似文献

17.

Representing and Reasoning About XML with Ontologies

Fu Zhang Z. M. Ma 《Applied Intelligence》2014,40(1):74-106

The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web. Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web. Currently, finding a way to utilize XML data for the Semantic Web is challenging research. As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability. Therefore, in this paper, we investigate how to represent and reason about XML with ontologies. Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents. On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples. Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies. Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners. 相似文献

18.

Bottom-up discovery of frequent rooted unordered subtrees

Yijun Bei Gang Chen Lidan Shou Xiaoyan Li Jinxiang Dong 《Information Sciences》2009,179(1-2):70-88

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching. 相似文献

19.

XFlat: Query-friendly encrypted XML view publishing

Jun Gao Tengjiao Wang 《Information Sciences》2008,178(3):774-787

The security of published XML data receives exceptional attention due to its sensitive nature in many applications. This paper proposes an XML view publishing method called XFlat. Compared with other methods, XFlat focuses on query performance over the published XML view while simultaneously protecting the sensitive data via encryption techniques. XFlat decomposes an XML tree into a set of sub-trees, in each of which multiple users have the same accessibility to all nodes, and may encrypt and store each sub-tree in a flat, sequential manner. This storage strategy can avoid the nested encryption cost in view construction and the nested decryption cost in query evaluation. In addition, we discuss how to generate a user-specific schema and how to minimize the total space cost of the published XML view when considering the overhead of the relationships among the sub-trees. We also propose an XML schema index to enhance query performance over the final XML view. The experimental results demonstrate the effectiveness and efficiency of the proposed XFlat method. 相似文献

20.

Reasoning about XML update constraints

Bogdan Cautis Serge Abiteboul Tova Milo 《Journal of Computer and System Sciences》2009,75(6):336-358

We introduce in this paper a class of constraints for describing how an XML document can evolve, namely XML update constraints. For these constraints, we study the implication problem, giving algorithms and complexity results for constraints of varying expressive power. Besides classical constraint implication, we also consider an instance-based approach in which we take into account data. More precisely, we study implication with respect to a current tree instance, resulting from a series of unknown updates. The main motivation of our work is reasoning about data integrity under update restrictions in contexts where owners may lose control over their data, such as in publishing or exchange. 相似文献