期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Normalization and optimization of schema mappings

Georg Gottlob Reinhard Pichler Vadim Savenkov 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(2):277-302

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries. 相似文献

2.

Tractable XML data exchange via relations

Rada CHIRKOVA Leonid LIBKIN Juan L. REUTTER 《Frontiers of Computer Science》2012,6(3):243-263

We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system. 相似文献

3.

Schema mediation for large-scale semantic data sharing

Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):68-83

Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri 相似文献

4.

Preserving mapping consistency under schema changes 总被引：1，自引：0，他引：1

Yannis?Velegrakis Email author Renée?J.?Miller Lucian?Popa 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(3):274-293

In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. When a mapping is left inconsistent by a schema change, it has to be detected and updated. We present a novel framework and a tool (ToMAS) for automatically adapting (rewriting) mappings as schemas evolve. Our approach considers not only local changes to a schema but also changes that may affect and transform many components of a schema. Our algorithm detects mappings affected by structural or constraint changes and generates all the rewritings that are consistent with the semantics of the changed schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. When there is more than one candidate rewriting, the algorithm may rank them based on how close they are to the semantics of the existing mappings.Received: 13 January 2004, Accepted: 26 March 2004, Published online: 12 August 2004Edited by: M. Carey 相似文献

5.

Schema mapping and query translation in heterogeneous P2P XML databases

Angela Bonifati Elaine Chang Terence Ho Laks V. S. Lakshmanan Rachel Pottinger Yongik Chung 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):231-256

Peers in a peer-to-peer data management system often have heterogeneous schemas and no mediated global schema. To translate queries across peers, we assume each peer provides correspondences between its schema and a small number of other peer schemas. We focus on query reformulation in the presence of heterogeneous XML schemas, including data–metadata conflicts. We develop an algorithm for inferring precise mapping rules from informal schema correspondences. We define the semantics of query answering in this setting and develop query translation algorithm. Our translation handles an expressive fragment of XQuery and works both along and against the direction of mapping rules. We describe the HePToX heterogeneous P2P XML data management system which incorporates our results. We report the results of extensive experiments on HePToX on both synthetic and real datasets. We demonstrate our system utility and scalability on different P2P distributions. 相似文献

6.

Data integration with uncertainty 总被引：1，自引：0，他引：1

Xin Luna Dong Alon Halevy Cong Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):469-500

This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange. 相似文献

7.

Uninterpreted Schema Matching with Embedded Value Mapping under Opaque Column Names and Data Values 总被引：1，自引：0，他引：1

Jaiswal Anuj Miller David J. Mitra Prasenjit 《Knowledge and Data Engineering, IEEE Transactions on》2010,22(2):291-304

Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity. 相似文献

8.

The Piazza peer data management system 总被引：5，自引：0，他引：5

Halevy A.Y. Ives Z.G. Jayant Madhavan Mork P. Suciu D. Tatarinov I. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(7):787-798

Intuitively, data management and data integration tools are well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: They typically require a comprehensive schema design before they can be used to store or share information and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: We propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers' schemes. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers' individual schemas. This paper describes-several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and optimization algorithms, and the relevance of PDMSs to the semantic Web. 相似文献

9.

MapMerge: correlating independent schema mappings

Bogdan Alexe Mauricio Hernández Lucian Popa Wang-Chiew Tan 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(2):191-211

One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings. 相似文献

10.

An MDE-based methodology for closed-world integrity constraint checking in the semantic web

《Journal of Web Semantics》2022

Ontology-based data-centric systems support open-world reasoning. Therefore, for these systems, Web Ontology Language (OWL) and Semantic Web Rule Language (SWRL) are not suitable for expressing integrity constraints based on the closed-world assumption. Thus, the requirement of integrating the open-world assumption of OWL/SWRL with closed-world integrity constraint checking is inevitable. SPARQL, recommended by World Wide Web (W3C), is a query language for RDF graphs, and many research studies have shown that it is a perfect candidate for closed-world constraint checking for ontology-based data-centric applications. In this regard, many research studies have been performed to transform integrity constraints into SPARQL queries where some studies have shown the limitations of partial expressivity of knowledge bases while performing the indirect transformations, whereas others are limited to a platform-specific implementation. To address these issues, this paper presents a flexible and formal methodology that employs Model-Driven Engineering (MDE) to model closed-world integrity constraints for open-world reasoning. The proposed approach offers semantic validation of data by expressing integrity constraints at both the model level and the code level. Moreover, straightforward transformations from OWL/SWRL to SPARQL can be performed. Finally, the methodology is demonstrated via a real-world case study of water observations data. 相似文献

11.

基于模式映射的查询计划生成算法

李由刘东波张维明《计算机科学》2006,33(3):125-128

因特网的迅速发展使得多数据源综合集成日益重要.但是,不同数据源之间数据结构和语义的异构性导致数据集成是相当困难的.本文提出了一种基于模式映射的查询计划生成算法.该算法在正确定义映射规则的前提下,根据不同的查询条件和不同的数据源模式,自动构造查询计划,并保证结果数据满足目标模式结构与引用完整性要求. 相似文献

12.

Data exchange: semantics and query answering

《Theoretical computer science》2005,336(1):89-124

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem.We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. We show that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions. We then identify fairly general, yet practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of the “certain answers” in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also address other algorithmic issues that arise in data exchange. In particular, we study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution, and we explore the boundary of what queries can and cannot be answered this way, in a data exchange setting. 相似文献

13.

Object-oriented query language access to relational databases: A semantic framework for query translation

Susan D. Urban Taoufik Ben Abdellatif 《Journal of Systems Integration》1995,5(2):123-156

This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper. 相似文献

14.

Temporal data exchange

《Information Systems》2020

Data exchange is the problem of transforming data that is structured under a source schema into data structured under another schema, called the target schema, so that both the source and target data satisfy the relationship between the schemas. Many applications such as planning, scheduling, medical and fraud detection systems, require data exchange in the context of temporal data. Even though the formal framework of data exchange for relational database systems is well-established, it does not immediately carry over to the settings of temporal data, which necessitates reasoning over unbounded periods of time.In this work, we study data exchange for temporal data. We first motivate the need for two views of temporal data: the concrete view, which depicts how temporal data is compactly represented and on which the implementations are based, and the abstract view, which defines the semantics of temporal data as a sequence of snapshots. We first extend the chase procedure for the abstract view to have a conceptual basis for the data exchange for temporal databases. Considering non-temporal source-to-target tuple generating dependencies and equality generating dependencies, the chase algorithm can be applied on each snapshot independently. Then we define a chase procedure (called c-chase) on concrete instances and show the result of c-chase on a concrete instance is semantically aligned with the result of chase on the corresponding abstract instance. In order to interpret intervals as constants while checking if a dependency or a query is satisfied by a concrete database, we will normalize the instance with respect to the dependency or the query. To obtain the semantic alignment, the nulls (which are introduced by data exchange and model incompleteness) in the concrete view are annotated with temporal information. Furthermore, we show that the result of the concrete chase provides a foundation for query answering. We define naïve evaluation on the result of the c-chase and show it produces certain answers. 相似文献

15.

On the finite controllability of conjunctive query answering in databases under open-world assumption

Riccardo Rosati 《Journal of Computer and System Sciences》2011,77(3):572-594

In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries and unions of conjunctive queries. We present results about the decidability of OWA query answering under ICs. In particular, we study OWA query answering both over finite databases and over unrestricted databases, and identify the cases in which such a problem is finitely controllable, i.e., when OWA query answering over finite databases coincides with OWA query answering over unrestricted databases. Moreover, we are able to easily turn the above results into new results about implication of ICs and query containment under ICs, due to the deep relationship between OWA query answering and these two classical problems in database theory. In particular, we close two long-standing open problems in query containment, since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies. The results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption: e.g., data integration, data exchange, view-based information access, ontology-based information systems, and peer data management systems. 相似文献

16.

Inconsistency-tolerant query answering in ontology-based data access

《Journal of Web Semantics》2015

相似文献

17.

Tableaux-based optimization of schema mappings for data integration

Md Anisur Rahman Mehedi Masud Iluju Kiringa Abdulmotaleb El Saddik 《Journal of Intelligent Information Systems》2012,38(2):533-554

The task of combining data residing at different sources to provide the user a unified view is known as data integration. Schema mappings act as glue between the global schema and the source schemas of a data integration system. Global-and-local-as-view (GLAV) is one the approaches for specifying the schema mappings. Tableaux are used for expressing queries and functional dependencies on a single database. We investigate a general technique for expressing a GLAV mapping by a tabular structure called mapping assertion tableaux (MAT). In a similar way, we also express the tuple generating dependency (tgd) and equality generating dependency (egd) constraints by tabular forms, called tabular tgd (TTGD) and tabular egd (TEGD), respectively. A set consisting of the MATs, TTGDs and TEGDs are called schema mapping tableaux (SMT). We present algorithms that use SMT as operator on an instance of the source schema to produce an instance of the target schema. We show that the target instances computed by the SMT are ‘minimal’ and ‘most general’ in nature. We also define the notion of equivalence between the schema mappings of two data integration systems and present algorithms that optimize schema mappings through the manipulation of the SMT. 相似文献

18.

基于模式集成语义的查询处理 总被引：1，自引：0，他引：1

石祥滨张斌于戈郑怀远《软件学报》1998,9(5):321-326

在采用面向对象模型作为公共数据模型的多数据库系统中，基于模式集成语义的查询处理不仅要实现针对集成模式查询到针对输出模式查询的转换，而且要从语义上尽可能减少回答用户查询所需数据，保证对象引用的正确性.为了达到这个目标,提出了一些新的概念及基于模式集成语义的查询处理规则和路径表达式的查询处理方法. 相似文献

19.

多数据库系统中的模式映射方法

李瑞轩卢正鼎肖卫军李兵《计算机工程与科学》2004,26(3):65-68

多数据库系统一般具有四级模式结构，全局用户只能访问全局模式，而最终的数据必须从各局部数据库系统中获得，因此必须建立多数据库系统的模式映射，它表示了局部模式通过输出模式集成为全局模式的相应转换。本文给出了一种多数据库系统中的模式映射方法，并使用模式映射树来存储和表达这种模式映射。相似文献

20.

Inconsistency tolerance in P2P data integration: An epistemic logic approach

Diego Calvanese Giuseppe De Giacomo Domenico Lembo Maurizio Lenzerini Riccardo Rosati 《Information Systems》2008,33(4-5):360-384

We study peer-to-peer (P2P) data integration, where each peer models an autonomous system that exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas, rather than through a unique global schema. We propose a multi-modal epistemic logical formalization based on the idea that each peer is conceived as a rational agent that exchanges knowledge/belief with other peers, thus nicely modeling the modular structure of the system. We then address the issue of dealing with possible inconsistencies, and distinguish between two types of inconsistencies, called local and P2P, respectively. We define a nonmonotonic extension of our logic that is able to reason on the beliefs of peers under both local and P2P inconsistency tolerance. Tolerance to local inconsistency essentially means that the presence of inconsistency within one peer does not affect the consistency of the whole system. Tolerance to P2P inconsistency means being able to resolve inconsistencies arising from the interaction between peers. We study query answering in the new nonmonotonic logic, with the main goal of establishing its decidability and its computational complexity. Indeed, we show that, under reasonable assumptions on peer schemas, query answering is decidable, and is coNP-complete with respect to data complexity, i.e., the size of the data stored at the peers. 相似文献