期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Duality for Goal-Driven Query Processing in Disjunctive Deductive Databases

Adnan H. Yahya 《Journal of Automated Reasoning》2002,28(1):1-34

Bottom-up query-answering procedures tend to explore a much larger search space than what is strictly needed. Top-down processing methods use the query to perform a more focused search that can result in more efficient query answering. Given a disjunctive deductive database, DB, and a query, Q, we establish a strong connection between model generation and clause derivability in two different representations of DB and Q. This allows us to use a bottom-up procedure for evaluating Q against DB in a top-down fashion. The approach requires no extensive rewriting of the input theory and introduces no new predicates. Rather, it is based on a certain duality principle for interpreting logical connectives. The duality transformation is achieved by reversing the direction of implication arrows in the clauses representing both the theory and the negation of the query. The application of a generic bottom-up procedure to the transformed clause set results in top-down query answering. Under favorable conditions efficiency gains are substantial, as shown by our preliminary testing. We give the logical meaning of the duality transformation and point to the conditions and sources of improved efficiency. We show how the duality approach can be used for refined query answering by specifying the minimal conditions (weakest updates) to DB under which Q becomes derivable. This is shown to be useful for view updates in disjunctive deductive databases as well as for other interesting applications. 相似文献

2.

Continuous query processing over music streams based on approximate matching mechanisms

Hung-Chen Chen Yi-Hung Wu Yu-Chi Soo Arbee L. P. Chen 《Multimedia Systems》2008,14(1):51-70

It is foreseen that more and more music objects in symbolic format and multimedia objects, such as audio, video, or lyrics, integrated with symbolic music representation (SMR) will be published and broadcasted via the Internet. The SMRs of the flowing songs or multimedia objects will form a music stream. Many interesting applications based on music streams, such as interactive music tutorials, distance music education, and similar theme searching, make the research of content-based retrieval over music streams much important. We consider multiple queries with error tolerances over music streams and address the issue of approximate matching in this environment. We propose a novel approach to continuously process multiple queries over the music streams for finding all the music segments that are similar to the queries. Our approach is based on the concept of n-grams, and two mechanisms are designed to reduce the heavy computation of approximate matching. One mechanism uses the clustering of query n-grams to prune the query n-grams that are irrelevant to the incoming data n-gram. The other mechanism records the data n-gram that matches a query n-gram as a partial answer and incrementally merges the partial answers of the same query. We implement a prototype system for experiments in which songs in the MIDI format are continuously broadcasted, and the user can specify musical segments as queries to monitor the music streams. Experiment results show the effectiveness and efficiency of the proposed approach. 相似文献

3.

Semantics and evaluation of top-<Emphasis Type="Italic">k</Emphasis> queries in probabilistic databases

Xi Zhang Jan Chomicki 《Distributed and Parallel Databases》2009,26(1):67-126

We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective. Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1]. 相似文献

4.

Partial Evaluation of Views

Parke Godfrey Jarek Gryz 《Journal of Intelligent Information Systems》2001,16(1):21-39

Many database applications and environments, such as mediation over heterogeneous database sources and data warehousing for decision support, lead to complex queries. Queries are often nested, defined over previously defined views, and may involve unions. There are good reasons why one might want to remove pieces (sub-queries or sub-views) from such queries: some sub-views of a query may be effectively cached from previous queries, or may be materialized views; some may be known to evaluate empty, by reasoning over the integrity constraints; and some may match protected queries, which for security cannot be evaluated for all users.In this paper, we present a new evaluation strategy with respect to queries defined over views, which we call tuple-tagging, that allows for an efficient removal of sub-views from the query. Other approaches to this are to rewrite the query so the sub-views to be removed are effectively gone, then to evaluate the rewritten query. With the tuple tagging evaluation, no rewrite of the original query is necessary.We describe formally a discounted query (a query with sub-views marked that are to be considered as removed), present the tuple tagging algorithm for evaluating discounted queries, provide an analysis of the algorithm's performance, and present some experimental results. These results strongly support the tuple-tagging algorithm both as an efficient means to effectively remove sub-views from a view query during evaluation, and as a viable optimization strategy for certain applications. The experiments also suggest that rewrite techniques for this may perform worse than the evaluation of the original query, and much worse than the tuple tagging approach. 相似文献

5.

Holistically Stream-based Processing Xtwig Queries

Guoren Wang Bo Ning Ge Yu 《World Wide Web》2008,11(4):407-425

Unlike a twig query, an Xtwig query contains some selection predicates with reverse axes which are either ancestor or parent. To evaluate such queries in the stream-based context, some rewriting rules have been proposed to transform the paths with reverse axes into equivalent reverse-axis-free ones. However, the transformation method is expensive due to multiple scanning input streams and the generation of unnecessary intermediate results. To solve these problems, a holistic stream-based algorithm XtwigStack is proposed for Xtwig queries. Experiments show that XtwigStack is much more efficient than the transformation method. 相似文献

6.

Indexing and querying XML using extended Dewey labeling scheme 总被引：1，自引：0，他引：1

Jiaheng LuAuthor Vitae Xiaofeng MengAuthor VitaeTok Wang LingAuthor Vitae 《Data & Knowledge Engineering》2011,70(1):35-59

Finding all the occurrences of a tree pattern in an XML database is a core operation for efficient evaluation of XML queries. The Dewey labeling scheme is commonly used to label an XML document to facilitate XML query processing by recording information on the path of an element. In order to improve the efficiency of XML tree pattern matching, we introduce a novel labeling scheme, called extended Dewey, which effectively extends the existing Dewey labeling scheme to combine the types and identifiers of elements in a label, and to avoid the scan of labels for internal query nodes to accelerate query processing (in I/O cost). Based on extended Dewey, we propose a series of holistic XML tree pattern matching algorithms. We first present TJFast to answer an XML twig pattern query. To efficiently answer a generalized XML tree pattern, we then propose GTJFast, an optimization that exploits the non-output nodes. In addition, we propose TJFastTL and GTJFastTL based on the tag + level data partition scheme to further reduce I/O costs by level pruning. Finally, we report our comprehensive experimental results to show that our set of XML tree pattern matching algorithms are superior to existing approaches in terms of the number of elements scanned, the size of intermediate results and query performance. 相似文献

7.

An efficient processing of a chain join with the minimum communication cost in distributed database systems

Xuemin Lin Maria E. Orlowska 《Distributed and Parallel Databases》1995,3(1):69-83

This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m ² n ²+m ³ n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the eigenorder of a chain join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a simple query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m ² n ²+m ³ n) toO(mn+m ²). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries. Editor: Peter Apers 相似文献

8.

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Joachim Hammer Lixin Fu 《Distributed and Parallel Databases》2003,14(3):221-254

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST⁺⁺ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST⁺⁺ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems. 相似文献

9.

Incremental sequence-based frequent query pattern mining from XML queries

Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly. 相似文献

10.

Algorithms for Nearest Neighbor Search on Moving Object Trajectories 总被引：2，自引：1，他引：1

Elias Frentzos Kostas Gratsias Nikos Pelekis Yannis Theodoridis 《GeoInformatica》2007,11(2):159-193

Nearest Neighbor (NN) search has been in the core of spatial and spatiotemporal database research during the last decade. The literature on NN query processing algorithms so far deals with either stationary or moving query points over static datasets or future (predicted) locations over a set of continuously moving points. With the increasing number of Mobile Location Services (MLS), the need for effective k-NN query processing over historical trajectory data has become the vehicle for data analysis, thus improving existing or even proposing new services. In this paper, we investigate mechanisms to perform NN search on R-tree-like structures storing historical information about moving object trajectories. The proposed (depth-first and best-first) algorithms vary with respect to the type of the query object (stationary or moving point) as well as the type of the query result (historical continuous or not), thus resulting in four types of NN queries. We also propose novel metrics to support our search ordering and pruning strategies. Using the implementation of the proposed algorithms on two members of the R-tree family for trajectory data (namely, the TB-tree and the 3D-R-tree), we demonstrate their scalability and efficiency through an extensive experimental study using large synthetic and real datasets.

Yannis Theodoridis (Corresponding author)Email: URL: http://dke.cti.gr http://isl.cs.unipi.gr/db

相似文献