期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Schema mapping and query translation in heterogeneous P2P XML databases

Angela Bonifati Elaine Chang Terence Ho Laks V. S. Lakshmanan Rachel Pottinger Yongik Chung 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):231-256

Peers in a peer-to-peer data management system often have heterogeneous schemas and no mediated global schema. To translate queries across peers, we assume each peer provides correspondences between its schema and a small number of other peer schemas. We focus on query reformulation in the presence of heterogeneous XML schemas, including data–metadata conflicts. We develop an algorithm for inferring precise mapping rules from informal schema correspondences. We define the semantics of query answering in this setting and develop query translation algorithm. Our translation handles an expressive fragment of XQuery and works both along and against the direction of mapping rules. We describe the HePToX heterogeneous P2P XML data management system which incorporates our results. We report the results of extensive experiments on HePToX on both synthetic and real datasets. We demonstrate our system utility and scalability on different P2P distributions. 相似文献

2.

Normalization and optimization of schema mappings

Georg Gottlob Reinhard Pichler Vadim Savenkov 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(2):277-302

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries. 相似文献

3.

XMin: Minimizing Tree Pattern Queries with Minimality Guarantee 总被引：1，自引：0，他引：1

Ki-Hoon Lee Kyu-Young Whang Wook-Shin Han 《World Wide Web》2010,13(3):343-371

Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method. 相似文献

4.

Efficient query processing framework for big data warehouse: an almost join-free approach

Huiju WANG Xiongpai QIN Xuan ZHOU Furong LI Zuoyan QIN Qing ZHU Shan WANG 《Frontiers of Computer Science》2015,9(2):224

The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude. 相似文献

5.

Containment of partially specified tree-pattern queries in the presence of dimension graphs

Dimitri Theodoratos Pawel Placek Theodore Dalamagas Stefanos Souldatos Timos Sellis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):233-254

Nowadays, huge volumes of data are organized or exported in tree-structured form. Querying capabilities are provided through tree-pattern queries. The need for querying tree-structured data sources when their structure is not fully known, and the need to integrate multiple data sources with different tree structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In this paper, we consider a query language that allows the partial specification of a tree pattern. Queries in this language range from structureless keyword-based queries to completely specified tree patterns. To support the evaluation of partially specified queries, we use semantically rich constructs, called dimension graphs, which abstract structural information of the tree-structured data. We address the problem of query containment in the presence of dimension graphs and we provide necessary and sufficient conditions for query containment. As checking query containment can be expensive, we suggest two heuristic approaches for query containment in the presence of dimension graphs. Our approaches are based on extracting structural information from the dimension graph that can be added to the queries while preserving equivalence with respect to the dimension graph. We considered both cases: extracting and storing different types of structural information in advance, and extracting information on-the-fly (at query time). Both approaches are implemented, validated, and compared through experimental evaluation. 相似文献

6.

Data integration with uncertainty 总被引：1，自引：0，他引：1

Xin Luna Dong Alon Halevy Cong Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):469-500

This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange. 相似文献

7.

A syntactic approach to twig-query matching on XML streams

Chien-Ping Chou Author VitaeKuen-Fang JeaAuthor Vitae Heng-Hsun Liao Author Vitae 《Journal of Systems and Software》2011,84(6):993-1007

Query matching on XML streams is challenging work for querying efficiency when the amount of queried stream data is huge and the data can be streamed in continuously. In this paper, the method Syntactic Twig-Query Matching (STQM) is proposed to process queries on an XML stream and return the query results continuously and immediately. STQM matches twig queries on the XML stream in a syntactic manner by using a lexical analyzer and a parser, both of which are built from our lexical-rules and grammar-rules generators according to the user's queries and document schema, respectively. For query matching, the lexical analyzer scans the incoming XML stream and the parser recognizes XML structures for retrieving every twig-query result from the XML stream. Moreover, STQM obtains query results without a post-phase for excluding false positives, which are common in many streaming query methods. Through the experimental results, we found that STQM matches the twig query efficiently and also has good scalability both in the queried data size and the branch degree of the twig query. The proposed method takes less execution time than that of a sequence-based approach, which is widely accepted as a proper solution to the XML stream query. 相似文献

8.

Semantics preserving SPARQL-to-SQL translation 总被引：2，自引：0，他引：2

Artem Shiyong Farshad 《Data & Knowledge Engineering》2009,68(10):973-1000

Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness. 相似文献

9.

从面向对象数据库模式到关系数据库模式的转换 总被引：14，自引：1，他引：14

王学荣曾晓勤《计算机工程与科学》2003,25(5):100-107

本文提出了一种从面向对象数据库模式到关系数据库模式的映射及基于该映射的模式转换算法。以查询为例,说明了面向对象数据库中的特有语义仍能保留在转换后的关系模式中,而且从面向对象数据库到关系数据库的基于该模式转换中操纵运算的转换也是切实可行的。所得的模式转换结果可应用于面向对象数据库和关系数据库之间的互操作。相似文献

10.

Answering queries using views: A survey 总被引：25，自引：0，他引：25

Alon Y. Halevy 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):270-294

The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results. Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001 相似文献

11.

基于模式集成语义的查询处理 总被引：1，自引：0，他引：1

石祥滨张斌于戈郑怀远《软件学报》1998,9(5):321-326

在采用面向对象模型作为公共数据模型的多数据库系统中，基于模式集成语义的查询处理不仅要实现针对集成模式查询到针对输出模式查询的转换，而且要从语义上尽可能减少回答用户查询所需数据，保证对象引用的正确性.为了达到这个目标,提出了一些新的概念及基于模式集成语义的查询处理规则和路径表达式的查询处理方法. 相似文献

12.

Data exchange and schema mappings in open and closed worlds

Leonid Libkin Cristina Sirangelo 《Journal of Computer and System Sciences》2011,77(3):542-571

In the study of data exchange one usually assumes an open-world semantics, making it possible to extend instances of target schemas. An alternative closed-world semantics only moves ‘as much data as needed’ from the source to the target to satisfy constraints of a schema mapping. It avoids some of the problems exhibited by the open-world semantics, but limits the expressivity of schema mappings. Here we propose a mixed approach: one can designate different attributes of target schemas as open or closed, to combine the additional expressivity of the open-world semantics with the better behavior of query answering in closed worlds. We define such schema mappings, and show that they cover a large space of data exchange solutions with two extremes being the known open and closed-world semantics. We investigate the problems of query answering and schema mapping composition, and prove two trichotomy theorems, classifying their complexity based on the number of open attributes. We find conditions under which schema mappings compose, extending known results to a wide range of closed-world mappings. We also provide results for restricted classes of queries and mappings guaranteeing lower complexity. 相似文献

13.

基于Shrex的XQuery-SQL查询转换模型的设计与实现

龙冠卿潘荫荣胡幼华《计算机应用与软件》2008,25(6):120-123

目前还没有一个不依赖于模式映射、且支持复杂嵌套XQuery到SQL的查询转换解决方案.针对现状,设计并实现了一个将XQuery转换为等价SQL的查询转换模型(EXSM).该模型基于Shrex框架,以简洁的方式解决了模式依赖问题,并采用树形中间结构,使之支持复杂的嵌套XQuery.因此,该模型有效地涵盖解决了目前存在的两个问题. 相似文献

14.

Efficient schema-based XML-to-Relational data mapping

Mustafa Atay Artem Chebotko Dapeng Liu Shiyong Lu Farshad Fotouhi 《Information Systems》2007

Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query mapping. In this paper, we propose: (i) a lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, (ii) two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data. To our best knowledge, there is no published linear schema-based data mapping algorithm for mapping ordered XML data to relational data. Experimental results are presented to show that our algorithms are efficient and scalable. 相似文献

15.

Producing Interoperable Queries for Relational and Object-Oriented Databases

Ya-Hui Chang Louiqa Raschid 《Journal of Intelligent Information Systems》2000,14(1):51-75

In this paper, we develop techniques to produce interoperable queries with object and relational databases. A user poses a local query in a local query language, against a local object or relational schema. We transparently produce appropriate queries with respect to a remote target object or relational schema, corresponding to some remote database which contains data relevant to the user's query. Mapping knowledge to resolve representational heterogeneities in local and remote schemas is expressed in a canonical representation, CRmapping, and is independent of the particular data model. A canonical representation CRquery is also used to resolve heterogeneities of query languages. A set of heterogeneous transformation algorithms define the appropriate transformations from the local queries to the remote queries. The use of canonical representations (CR) allows us to represent queries independent of the particular query language, and to resolve representational conflicts in a uniform manner, independent of models and query languages. 相似文献

16.

Views in a large-scale XML repository

Vincent Aguilera Sophie Cluet Tova Milo Pierangelo Veltri Dan Vodislav 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(3):238-255

We are interested in defining and querying views in a huge and highly heterogeneous XML repository (Web scale). In this context, view definitions are very large, involving lots of sources, and there is no apparent limitation to their size. This raises interesting problems that we address in the paper: (i) how to distribute views over several machines without having a negative impact on the query translation process; (ii) how to quickly select the relevant part of a view given a query; (iii) how to minimize the cost of communicating potentially large queries to the machines where they will be evaluated. The solution that we propose is based on a simple view definition language that allows for automatic generation of views. The language maps paths in the view abstract DTD to paths in the concrete source DTDs. It enables a distributed implementation of the view system that is scalable both in terms of data and load. In particular, the query translation algorithm is shown to have a good (linear) complexity. Received: November 1, 2001 / Accepted: March 2, 2002 Published online: September 25, 2002 相似文献

17.

MAT learners for tree series: an abstract data type and two realizations

Frank Drewes Johanna Högberg Andreas Maletti 《Acta Informatica》2011,48(3):165-189

We propose abstract observation tables, an abstract data type for learning deterministic weighted tree automata in Angluin’s minimal adequate teacher (MAT) model, and show that every correct implementation of abstract observation tables yields a correct MAT learner. Besides the “classical” observation table, we show that abstract observation tables can also be implemented by observation trees. The advantage of the latter is that they often require fewer queries to the teacher. 相似文献

18.

An inductive database system based on virtual mining views

Hendrik Blockeel Toon Calders élisa Fromont Bart Goethals Adriana Prado Céline Robardet 《Data mining and knowledge discovery》2012,24(1):247-287

Inductive databases integrate database querying with database mining. In this article, we present an inductive database system that does not rely on a new data mining query language, but on plain SQL. We propose an intuitive and elegant framework based on virtual mining views, which are relational tables that virtually contain the complete output of data mining algorithms executed over a given data table. We show that several types of patterns and models that are implicitly present in the data, such as itemsets, association rules, and decision trees, can be represented and queried with SQL using a unifying framework. As a proof of concept, we illustrate a complete data mining scenario with SQL queries over the mining views, which is executed in our system. 相似文献

19.

Handling redundant processing in OBDA query execution over relational sources

《Journal of Web Semantics》2021

Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method. 相似文献

20.

Evaluation of RDF queries via equivalence

Weiwei Ni Zhihong Chong Hu Shu Jiajia Bao Aoying Zhou 《Frontiers of Computer Science》2013,7(1):20-33

相似文献