共查询到10条相似文献,搜索用时 50 毫秒
1.
Karl Schnaitter Joshua Spiegel Neoklis Polyzotis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):521-542
A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most
relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators
that integrate the selection of top tuples with join processing. These operators access just “enough” of the input in order
to generate just “enough” output and can offer significant speed-ups for query evaluation. The number of input tuples that
an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized
physical plans. We introduce an estimation methodology, termed deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the
base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly
into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization
of the formal deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the
performance of deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of deep as an estimation method and demonstrate its advantages over previously proposed techniques. 相似文献
2.
Shantanu Joshi Christopher Jermaine 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):181-202
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query,
where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased
estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting
estimate. The two estimators are tested over an extensive set of experiments.
Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170. 相似文献
3.
Sudipto Guha Hyoungmin Park Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1079-1099
Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent
response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications
and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios
and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact
that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of
the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis
construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis
construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate
the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets.
Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119.
Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research
Center Support Program, grant number IITA-2006-C1090-0603-0031. 相似文献
4.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments. 相似文献
5.
6.
Dunren Che Karl Aberer M. Tamer Özsu 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(3):263-289
While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query
processing and optimization for XML have now become more important than ever. This article reports our recent advances in
XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed
for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented
as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences
with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic
transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations
on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization
efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage
model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information
servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach. 相似文献
7.
There is growing interest in algorithms for processing and querying continuous data streams (i.e., data seen only once in a fixed order) with limited memory resources. In its most general form, a data stream is actually an update stream, i.e., comprising data-item deletions as well as insertions. Such massive update streams arise naturally in several application domains (e.g., monitoring of large IP network installations or processing of retail-chain transactions). Estimating the cardinality of set expressions defined over several (possibly distributed) update streams is perhaps one of the most fundamental query classes of interest; as an example, such a query may ask what is the number of distinct IP source addresses seen in passing packets from both router R
1 and R
2 but not router R
3?. Earlier work only addressed very restricted forms of this problem, focusing solely on the special case of insert-only streams and specific operators (e.g., union). In this paper, we propose the first space-efficient algorithmic solution for estimating the cardinality of full-fledged set expressions over general update streams. Our estimation algorithms are probabilistic in nature and rely on a novel, hash-based synopsis data structure, termed 2-level hash sketch. We demonstrate how our 2-level hash sketch synopses can be used to provide low-error, high-confidence estimates for the cardinality of set expressions (including operators such as set union, intersection, and difference) over continuous update streams, using only space that is significantly sublinear in the sizes of the streaming input (multi-)sets. Furthermore, our estimators never require rescanning or resampling of past stream items, regardless of the number of deletions in the stream. We also present lower bounds for the problem, demonstrating that the space usage of our estimation algorithms is within small factors of the optimal. Finally, we propose an optimized, time-efficient stream synopsis (based on 2-level hash sketches) that provides similar, strong accuracy-space guarantees while requiring only guaranteed logarithmic maintenance time per update, thus making our methods applicable for truly rapid-rate data streams. Our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach.Received: 20 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.Sumit Ganguly: sganguly@cse.iitk.ac.in Current affiliation: Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India 相似文献
8.
9.
在基于自适应技术的基础上,提出了一个Web服务查询优化模型(web service profiler-reoptimizer-cache,WSPRC).在其核心组件之一的Reoptimizer上采用自适应贪婪算法,分析了其执行Web服务查询优化的过程,并从Web服务有前向约束、无前向约束和信息变化等方面与传统的贪婪算法进行了对比.实验结果表明,WSPRC模型和A-Greedy算法提高了Web服务查询访问的效率,节省了查询成本. 相似文献
10.
通过分析在线聚集与在线动态重排序技术,结合近似查询处理和国会抽样方法,提出了在线分组聚集方案,该方案具有广泛的应用前景。 相似文献