期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Depth estimation for ranking query optimization

Karl Schnaitter Joshua Spiegel Neoklis Polyzotis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):521-542

A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques. 相似文献

2.

Sampling-based estimators for subset-based queries

Shantanu Joshi Christopher Jermaine 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):181-202

We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170. 相似文献

3.

Wavelet synopsis for hierarchical range queries with workloads

Sudipto Guha Hyoungmin Park Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1079-1099

Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets. Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119. Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research Center Support Program, grant number IITA-2006-C1090-0603-0031. 相似文献

4.

Adaptive optimization for multiple continuous queries

Hong Kyu Park Author VitaeWon Suk LeeAuthor Vitae 《Data & Knowledge Engineering》2012,71(1):29-46

Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments. 相似文献

5.

Histograms based on the minimum description length principle

Hai Wang Kenneth C. Sevcik 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):419-442

相似文献

6.

Query optimization in XML structured-document databases

Dunren Che Karl Aberer M. Tamer Özsu 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(3):263-289

While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach. 相似文献

7.

Tracking set-expression cardinalities over continuous update streams

Sumit?Ganguly Email author Minos?Garofalakis Rajeev?Rastogi 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(4):354-369

There is growing interest in algorithms for processing and querying continuous data streams (i.e., data seen only once in a fixed order) with limited memory resources. In its most general form, a data stream is actually an update stream, i.e., comprising data-item deletions as well as insertions. Such massive update streams arise naturally in several application domains (e.g., monitoring of large IP network installations or processing of retail-chain transactions). Estimating the cardinality of set expressions defined over several (possibly distributed) update streams is perhaps one of the most fundamental query classes of interest; as an example, such a query may ask what is the number of distinct IP source addresses seen in passing packets from both router R ₁ and R ₂ but not router R ₃?. Earlier work only addressed very restricted forms of this problem, focusing solely on the special case of insert-only streams and specific operators (e.g., union). In this paper, we propose the first space-efficient algorithmic solution for estimating the cardinality of full-fledged set expressions over general update streams. Our estimation algorithms are probabilistic in nature and rely on a novel, hash-based synopsis data structure, termed 2-level hash sketch. We demonstrate how our 2-level hash sketch synopses can be used to provide low-error, high-confidence estimates for the cardinality of set expressions (including operators such as set union, intersection, and difference) over continuous update streams, using only space that is significantly sublinear in the sizes of the streaming input (multi-)sets. Furthermore, our estimators never require rescanning or resampling of past stream items, regardless of the number of deletions in the stream. We also present lower bounds for the problem, demonstrating that the space usage of our estimation algorithms is within small factors of the optimal. Finally, we propose an optimized, time-efficient stream synopsis (based on 2-level hash sketches) that provides similar, strong accuracy-space guarantees while requiring only guaranteed logarithmic maintenance time per update, thus making our methods applicable for truly rapid-rate data streams. Our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach.Received: 20 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.Sumit Ganguly: sganguly@cse.iitk.ac.in Current affiliation: Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India 相似文献

8.

Dynamic programming solution for multiple query optimization problem

Ismail H. Toroslu Ahmet Cosar 《Information Processing Letters》2004,92(3):149-155

相似文献

9.

自适应贪婪算法在Web服务查询优化上的应用

孙茂荣刘呈则张益王晓翔《计算机工程与设计》2012,33(4):1403-1408

在基于自适应技术的基础上,提出了一个Web服务查询优化模型(web service profiler-reoptimizer-cache,WSPRC).在其核心组件之一的Reoptimizer上采用自适应贪婪算法,分析了其执行Web服务查询优化的过程,并从Web服务有前向约束、无前向约束和信息变化等方面与传统的贪婪算法进行了对比.实验结果表明,WSPRC模型和A-Greedy算法提高了Web服务查询访问的效率,节省了查询成本. 相似文献

10.

基于近似查询的在线分组聚集及其应用

冯玉才张鹏程《计算机工程》2005,31(16):97-99

通过分析在线聚集与在线动态重排序技术,结合近似查询处理和国会抽样方法,提出了在线分组聚集方案,该方案具有广泛的应用前景。相似文献