首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.  相似文献   

2.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765  相似文献   

3.
Calculating operators of continuously moving objects presents some unique challenges, especially when the operators involve aggregation or the concept of congestion, which happens when the number of moving objects in a changing or dynamic query space exceeds some threshold value. This paper presents the following six d-dimensional moving object operators: (1) MaxCount (or MinCount), which finds the Maximum (or Minimum) number of moving objects simultaneously present in the dynamic query space at any time during the query time interval. (2) CountRange, which finds a count of point objects whose trajectories intersect the dynamic query space during the query time interval. (3) ThresholdRange, which finds the set of time intervals during which the dynamic query space is congested. (4) ThresholdSum, which finds the total length of all the time intervals during which the dynamic query space is congested. (5) ThresholdCount, which finds the number of disjoint time intervals during which the dynamic query space is congested. And (6) ThresholdAverage, which finds the average length of time of all the time intervals when the dynamic query space is congested. For these operators separate algorithms are given to find only estimate or only precise values. Experimental results from more than 7,500 queries indicate that the estimation algorithms produce fast, efficient results with error under 5%.
Peter Revesz (Corresponding author)Email:

Scot Anderson   obtained his Ph.D. degree in Computer Science from the University of Nebraska—Lincoln in 2007. He is currently an assistant professor at Southern Adventist University. His research interests are geographic information systems, moving objects, and spatio-temporal data. Peter Revesz   holds a Ph.D. degree in Computer Science from Brown University and was a postdoctoral fellow at the University of Toronto before joining the University of Nebraska—Lincoln, where he is currently a full professor in the Department of Computer Science and Engineering. He is well-known as a co-inventor of constraint databases in a highly-cited joint paper with Paris Kanellakis and Gabriel Kuper. He is the author of the book “Introduction to Constraint Databases”, which was published by Springer in 2002. His current research interests include geographic information systems and spatio-temporal databases. He has been a visiting professor at the University of Athens in Greece, the University of Hasselt in Belgium and the Max Planck Institute for Computer Science and the University of Freiburg in Germany. He was awarded a Fulbright Award and an Alexander von Humboldt Research Fellowship.   相似文献   

4.
We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass” and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin fragment of the relational algebra.  相似文献   

5.
In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems. Edited by M. Adiba. Received May 1993 / Accepted April 1996  相似文献   

6.
In this paper we introduce a general framework for casting fully dynamic transitive closure into the problem of reevaluating polynomials over matrices. With this technique, we improve the best known bounds for fully dynamic transitive closure. In particular, we devise a deterministic algorithm for general directed graphs that achieves O(n 2) amortized time for updates, while preserving unit worst-case cost for queries. In case of deletions only, our algorithm performs updates faster in O(n) amortized time. We observe that fully dynamic transitive closure algorithms with O(1) query time maintain explicitly the transitive closure of the input graph, in order to answer each query with exactly one lookup (on its adjacency matrix). Since an update may change as many as Ω(n 2) entries of this matrix, no better bounds are possible for this class of algorithms. This work has been partially supported by the Sixth Framework Programme of the EU under contract number 507613 (Network of Excellence “EuroNGI: Designing and Engineering of the Next Generation Internet”), and number 001907 (“DELIS: Dynamically Evolving, Large Scale Information Systems”), and by the Italian Ministry of University and Research (Project “ALGO-NEXT: Algorithms for the Next Generation Internet and Web: Methodologies, Design and Experiments”). Portions of this paper have been presented at the 41st Annual Symp. on Foundations of Computer Science, 2000.  相似文献   

7.
《Information Systems》2005,30(3):167-204
Algebraic optimisation is both theoretically and practically important for query processing in complex value databases. In this paper, we consider this issue and investigate some algebraic properties concerning the nested relational operators.The join operation is one of the most time-consuming operations in nested relational query processing. We introduce a new join operator, called P-join, which combines the advantages of Roth's extended natural join and Colby's recursive join for efficient data access. We also investigate some algebraic properties concerning the P-join operator and extended relational operators, which can be used for query optimisation in nested relational databases.We then examine the role of the restructuring operators nest and unnest in their interactions with the extended relational operators proposed by Roth et al. Under certain functional and mutual data dependencies, the six nested relational equations will hold.Finally, we outline the steps of a heuristic optimisation algorithm that utilises algebraic transformation rules developed in this paper and previous related work to transform an initial query to an optimised one that is more efficient to execute.  相似文献   

8.
Liveness temporal properties state that something “good” eventually happens, e.g., every request is eventually granted. In Linear Temporal Logic (LTL), there is no a priori bound on the “wait time” for an eventuality to be fulfilled. That is, F θ asserts that θ holds eventually, but there is no bound on the time when θ will hold. This is troubling, as designers tend to interpret an eventuality F θ as an abstraction of a bounded eventuality F k θ, for an unknown k, and satisfaction of a liveness property is often not acceptable unless we can bound its wait time. We introduce here prompt-LTL, an extension of LTL with the prompt-eventually operator F p . A system S satisfies a prompt-LTL formula φ if there is some bound k on the wait time for all prompt-eventually subformulas of φ in all computations of S. We study various problems related to prompt-LTL, including realizability, model checking, and assume-guarantee model checking, and show that they can be solved by techniques that are quite close to the standard techniques for LTL.  相似文献   

9.
Answering heterogeneous database queries with degrees of uncertainty   总被引:1,自引:0,他引:1  
In heterogeneous database systems,partial values have been used to resolve some schema integration problems. Performing operations on partial values may producemaybe tuples in the query result which cannot be compared. Thus, users have no way to distinguish which maybe tuple is the most possible answer. In this paper, the concept of partial values is generalized toprobabilistic partial values. We propose an approach to resolve the schema integration problems using probabilistic partial values and develop a full set of extended relational operators for manipulating relations containing probabilistic partial values. With this approach, the uncertain answer tuples of a query are associated with degrees of uncertainty (represented by probabilities). That provides users a comparison among maybe tuples and a better understanding on the query results. Besides, extended selection and join are generalized to -selection and -join, respectively, which can be used to filter out maybe tuples with low probabilities — those which have probabilities smaller than .  相似文献   

10.
In this paper, we present a mechanism for detecting and representing changes, given the old and new versions of a set of interlinked Web documents, retrieved in response to a user's query. In particular, we show how to detect and represent Web deltas, i.e., changes in the Web documents that are relevant to a user's query in the context of our Web warehousing system called WHOWEDA (Warehouse of Web Data). In WHOWEDA, Web information is materialized views stored in Web tables in the form of Web tuples. These Web tuples, represented as directed graphs, can be manipulated using a set of Web algebraic operators. In this paper, we present a mechanism to detect relevant Web deltas using Web algebraic operators such as the Web join and the outer Web join. Web join is used to detect identical documents residing in two Web tables, whereas, outer Web join, a derivative of Web join, is used to identify dangling Web tuples. We show how to represent these changes using delta Web tables. We develop formal algorithms for the generation of delta Web tables identifying Web documents which have been added, deleted, or modified since the last query.  相似文献   

11.
Many database applications have the emerging need to support approximate queries that ask for strings that are similar to a given string, such as “name similar to smith” and “telephone number similar to 412-0964”. Query optimization needs the selectivity of such an approximate predicate, i.e., the fraction of records in the database that satisfy the condition. In this paper, we study the problem of estimating selectivities of approximate string predicates. We develop a novel technique, called Sepia, to solve the problem. Given a bag of strings, our technique groups the strings into clusters, builds a histogram structure for each cluster, and constructs a global histogram. It is based on the following intuition: given a query string q, a preselected string p in a cluster, and a string s in the cluster, based on the proximity between q and p, and the proximity between p and s, we can obtain a probability distribution from a global histogram about the similarity between q and s. We give a full specification of the technique using the edit distance metric. We study challenges in adopting this technique, including how to construct the histogram structures, how to use them to do selectivity estimation, and how to alleviate the effect of non-uniform errors in the estimation. We discuss how to extend the techniques to other similarity functions. Our extensive experiments on real data sets show that this technique can accurately estimate selectivities of approximate string predicates. A short version of this article appeared as [21] in the proceedings of the 31st International Conference on Very Large Data Bases (VLDB), August 30 – September 2, 2005, Trondheim, Norway. The source code of our algorithms is available at .  相似文献   

12.
We study the problem of efficiently extracting K entities, in a temporal database, which are most similar to a given search query. This problem is well studied in relational databases, where each entity is represented as a single record and there exist a variety of methods to define the similarity between a record and the search query. However, in temporal databases, each entity is represented as a sequence of historical records. How to properly define the similarity of each entity in the temporal database still remains an open problem. The main challenging is that, when a user issues a search query for an entity, he or she is prone to mix up information of the same entity at different time points. As a result, methods, which are used in relational databases based on record granularity, cannot work any further. Instead, we regard each entity as a set of “virtual records”, where attribute values of a “virtual record” can be from different records of the same entity. In this paper, we propose a novel evaluation model, based on which the similarity between each “virtual record” and the query can be effectively quantified, and the maximum similarity of its “virtual records” is taken as the similarity of an entity. For each entity, as the number of its “virtual records” is exponentially large, calculating the similarity of the entity is challenging. As a result, we further propose a Dominating Tree Algorithm (DTA), which is based on the bounding-pruning-refining strategy, to efficiently extract K entities with greatest similarities. We conduct extensive experiments on both real and synthetic datasets. The encouraging results show that our model for defining the similarity between each entity and the search query is effective, and the proposed DTA can perform at least two orders of magnitude improvement on the performance comparing with the naive approach.  相似文献   

13.
The agent design problem is as follows: given a specification of an environment, together with a specification of a task, is it possible to construct an agent that can be guaranteed to successfully accomplish the task in the environment? In this article, we study the computational complexity of the agent design problem for tasks that are of the form “achieve this state of affairs” or “maintain this state of affairs.” We consider three general formulations of these problems (in both non-deterministic and deterministic environments) that differ in the nature of what is viewed as an “acceptable” solution: in the least restrictive formulation, no limit is placed on the number of actions an agent is allowed to perform in attempting to meet the requirements of its specified task. We show that the resulting decision problems are intractable, in the sense that these are non-recursive (but recursively enumerable) for achievement tasks, and non-recursively enumerable for maintenance tasks. In the second formulation, the decision problem addresses the existence of agents that have satisfied their specified task within some given number of actions. Even in this more restrictive setting the resulting decision problems are either pspace-complete or np-complete. Our final formulation requires the environment to be history independent and bounded. In these cases polynomial time algorithms exist: for deterministic environments the decision problems are nl-complete; in non-deterministic environments, p-complete.  相似文献   

14.
Hájek introduced the logic enriching the logic BL by a unary connective vt which is a formalization of Zadeh’s fuzzy truth value “very true”. algebras, i.e., BL-algebras with unary operations, called vt-operators, which are among others subdiagonal, are an algebraic counterpart of Partially ordered commutative integral residuated monoids (pocrims) are common generalizations of both BL-algebras and Heyting algebras. The aim of our paper is to introduce and study algebraic properties of pocrims endowed by “very-true” and “very-false”-like operators. Research is supported by the Research and Development Council of Czech Government via project MSN 6198959214.  相似文献   

15.
Answering linear optimization queries with an approximate stream index   总被引:2,自引:2,他引:0  
We propose a SAO index to approximately answer arbitrary linear optimization queries in a sliding window of a data stream. It uses limited memory to maintain the most “important” tuples. At any time, for any linear optimization query, we can retrieve the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating a larger portion of the memory to the outer layers of the SAO index.
Philip S. YuEmail:
  相似文献   

16.
In this survey we illustrate basic procedures and methods for application of direct and inverse operators on a composite function and for determination of periodicities. We obtain a simplified regularization of the FFT (fast Fourier transform) on a complex process. For that matter, we have introduced a direct (A) and an inverse (A −1) operator in the FFT and in some other complex processes, where a one-dimensional array serves as an operator. We give some correct theoretical and experimental justification of these operators and discuss a program which uses similar algorithms. We provide experimental results of a complex function after application of direct and inverse operators. Sergey V. Ivanov. Born 1972. Graduated from the St. Petersburg State Electrotechnical University “LETI” (ETU) in 1997. Scientific interests: development of programmatic-algorithmic tools for biomedical investigations and biomedical data processing (mostly psychophysiological) from the viewpoint of games theory and statistical decision theory. Author of more than 25 publications.  相似文献   

17.
We study the problem of evaluating xpath queries over xml data that is stored in an rdbms via schema-based shredding. The interaction between recursion (descendants-axis) in xpath queries and recursion in dtds makes it challenging to answer xpath queries using rdbms. We present a new approach to translating xpath queries into sql queries based on a notion of extended XP ath expressions and a simple least fixpoint (lfp) operator. Extended xpath expressions are a mild extension of xpath, and the lfp operator takes a single input relation and is already supported by most commercial rdbms. We show that extended xpath expressions are capable of capturing both dtd recursion and xpath queries in a uniform framework. Furthermore, they can be translated into an equivalent sequence of sql queries with the lfp operator. We present algorithms for rewriting xpath queries over a (possibly recursive) dtd into extended xpath expressions and for translating extended xpath expressions to sql queries, as well as optimization techniques. The novelty of our approach consists in its capability to answer a large class of xpath queries by means of only low-end rdbms features already available in most rdbms, as well as its flexibility to accommodate existing relational query optimization techniques. In addition, these translation algorithms provide a solution to query answering for certain (possibly recursive) xml views of xml data. Our experimental results verify the effectiveness of our techniques. An extended abstract was presented at the 31st international conference on Very Large Data Bases (VLDB), 2005.  相似文献   

18.
一个基于三元组存储的列式OLAP查询执行引擎   总被引:1,自引:0,他引:1  
朱阅岸  张延松  周烜  王珊 《软件学报》2014,25(4):753-767
大数据与传统的数据仓库技术相结合产生了大数据实时分析处理需要(volume+velocity),它要求大数据背景下的数据仓库不能过多地依赖物化、索引等高存储代价的优化技术,而要提高实时处理能力来应对大数据分析中数据量大、查询分析复杂等特点.这些查询分析操作一般表现为在事实表和维表之间连接操作的基础上对结果集上进行分组聚集等操作.因此,表连接和分组聚集操作是ROLAP(relational OLAP)性能的两个重要决定因素.研究了新硬件平台下针对大规模数据的OLAP查询的性能,设计新的列存储OLAP查询执行引擎CDDTA-MMDB(columnar direct dimensional tuple access-main memory databasequeryexecutionengine,直接维表元组访问的内存数据库查询执行引擎).基于三元组的物化策略,使得CDDTA-MMDB能够减少内存列存储模型上表连接操作访问基表和中间数据结构的次数.首先,CDDTA-MMDB将查询分解为作用在维表和事实表上的子查询,如果只涉及过滤操作,子查询将生成<代理键,布尔值>二元组;否则,子查询生成<代理键,关键字,值>三元组.然后,只需一趟扫描事实表,利用事实表的外键映射函数直接定位相应三元组或者二元组,完成相应的过滤、连接或聚集操作.CDDTA-MMDB充分考虑了内存列存储数据库的设计原则,尽量减少随机内存访问.实验结果表明:CDDTA-MMDB是高效的,与具代表性的列存储数据库相比,比MonetDB 5.5快2.5倍,比C-store的invisible join快5倍;并且,CDDTA-MMDB在多核处理器上具有线性加速比.  相似文献   

19.
Databases with uncertainty and lineage   总被引:2,自引:0,他引:2  
This paper introduces uldbs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the uldb representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of uldb minimality—data-minimal and lineage-minimal—and study minimization of uldb representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. We also show how uldbs enable a new approach to query processing in probabilistic databases. Finally, we describe the current state of the Trio system, our implementation of uldbs under development at Stanford. This work was supported by the National Science Foundation under grants IIS-0324431, IIS-1098447, and IIS-9985114, by DARPA Contract #03-000225, and by a grant from the Boeing Corporation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号