共查询到20条相似文献,搜索用时 31 毫秒
1.
Kian-Lee Tan Cheng Hian Goh Beng Chin Ooi 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):261-278
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates.
The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers
are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding
running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates
(i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as
the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer
space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined.
For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We
also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently.
The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer
space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented
a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting
are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces
the initial waiting time of users significantly without sacrificing the quality of the answers.
Received April 25, 2000 / Accepted June 27, 2000 相似文献
2.
Query processing over object views of relational data 总被引:2,自引:0,他引:2
Gustav Fahl Tore Risch 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):261-281
This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in
a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated
to one or several queries against the relational database. The results of these queries are then processed to form an answer
to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the
object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data
residing in the object view with data retrieved from the relational database. We discuss the key issues when object views
of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view,
how to represent relational database access in OO query plans, how to provide the concept of object identity in the view,
how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to
process and optimize queries against the object view. The results are based on experiences from a running prototype implementation.
Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996 相似文献
3.
描述了一种基于时间序列数据流大纲的预测框架,提出了构建具有有效降噪效果的小波大纲的方法,可根据背景噪声而分层自适应设置去噪(保留)阈值。并且在这种小波大纲的基础上实现了多尺度概要的分析和预测方法,能够分析动态变化的高频数据流的趋势、拐点、周期、方差的变化,用来为时间序列数据流提供实时的注解。在实际电力负荷数据上的仿真实验证明这种方法可以提供快速的精确的近似预测。 相似文献
4.
Boris Chidlovskii Uwe M. Borghoff 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):2-17
Abstract. In meta-searchers accessing distributed Web-based information repositories, performance is a major issue. Efficient query
processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms
designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve
data. In this work, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctive
Web queries based on signature files. Our algorithms cope with both relations of semantic containment and intersection between a query and the corresponding cache
items. We also develop the cache replacement strategy to treat situations when cached items differ in size and contribution
when providing partial query answers. We report results of experiments and show how the caching mechanism is realized in the
Knowledge Broker system.
Received June 15, 1999 / Accepted December 24, 1999 相似文献
5.
Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems
(DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to
tackle this problem. However, their effectiveness degrade when the underlying data distribution is skewed. Another approach
based on the outlier management can limit the effect of data skews but fails to address other requirements of approximate
range aggregate queries, such as error guarantees and query processing efficiency. In this paper, we present a technique that
provides approximate answers to range aggregate queries on OLAP data cubes efficiently, with theoretical guarantees on the
errors. Our basic idea is to build different data structures to manage outliers and the rest of the data. Carefully chosen
outliers are organized in a quad-tree based indexing data structure to provide efficient access for query processing. A query-workload adaptive, tree-like synopsis data structure, called T
unable
P
artition-Tree (TP-Tree), is proposed to organize samples extracted from non-outlier data. Our experiments clearly demonstrate the merits of our
technique, by comparing with previous well-known techniques. 相似文献
6.
UnQL: a query language and algebra for semistructured data based on structural recursion 总被引:5,自引:0,他引:5
Peter Buneman Mary Fernandez Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):76-110
Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data
and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using
structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees.
On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data
top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel
using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational
queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed
as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal
properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its
data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying
data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with
respect to value equality.
Received: July 9, 1999 / Accepted: December 24, 1999 相似文献
7.
The GMAP: a versatile tool for physical data independence 总被引:1,自引:0,他引:1
Odysseas G. Tsatalos Marvin H. Solomon Yannis E. Ioannidis 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):101-118
Physical data independence is touted as a central feature of modern
database systems. It allows users to frame queries in terms of the logical
structure of the data, letting a query processor automatically translate
them into optimal plans that access physical storage structures. Both
relational and object-oriented systems, however, force users to frame their
queries in terms of a logical schema that is directly tied to physical
structures. We present an approach that eliminates this dependence. All
storage structures are defined in a declarative language based on
relational algebra as functions of a logical schema. We present an
algorithm, integrated with a conventional query optimizer, that translates
queries over this logical schema into plans that access the storage
structures. We also show how to compile update requests into plans that
update all relevant storage structures consistently and optimally.
Finally, we report on experiments with a prototype implementation of our
approach that demonstrate how it allows storage structures to be tuned to
the expected or observed workload to achieve significantly better
performance than is possible with conventional techniques.
Edited by
Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received
September 15, 1994 / Accepted September 1, 1995 相似文献
8.
D. Laurent J. Lechtenbörger N. Spyratos G. Vossen 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):295-315
Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources
can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage
of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to
given sets of queries and updates.
Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001 相似文献
9.
The in–network aggregation paradigm in sensor networks provides a versatile approach for evaluating aggregate queries. Traditional
approaches need a separate aggregate to be computed and communicated for each query and hence do not scale well with the number
of queries. Since approximate query results are sufficient for many applications, we use an alternate approach based on summary
data–structures. We consider two kinds of aggregate queries: location range queries that compute the sum of values reported by sensors in a given location range, and value range queries that compute the number of sensors that report values in a given range. We construct summary data–structures called linear sketches, over the sensor data using in–network aggregation and use them to answer aggregate queries in an approximate manner at the
base–station. There is a trade–off between accuracy of the query results and lifetime of the sensor network that can be exploited
to achieve increased lifetimes for a small loss in accuracy. Most commonly occurring sets of range queries are highly correlated
and display rich algebraic structure. Our approach takes full advantage of this by constructing linear sketches that depend
on queries. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks
for only a small loss in accuracy of the queries. Further, our approach achieves more accurate query results than the other
classical techniques using Discrete Fourier Transform and Discrete Wavelet Transform.
This work was supported in part by NASA under Cooperative Agreement NCC5–315. 相似文献
10.
We optimize relational queries using connection hypergraphs (CHGs). All operations including value-passing between SQL blocks
can be set-oriented. By introducing partial evaluations, reordering operations can be achieved for nested queries. For a query
using views, we merge CHGs for the views and the query into one CHG and then apply query optimization. Furthermore, we may
simulate magic sets methods elegantly in a CHG. Sideways information-passing strategies (SIPS) in a CHG amount to partial
evaluations of SIPS paths. We introduce the maximum SIPS strategy, which performs SIPS for all bindings and all SIPS paths
for a query. The new method has several advantages. First, the maximum SIPS strategy can be more efficient than the previous
SIPS based on simple heuristics. Second, it is conceptually simple and easy to implement. Third, the processing strategies
may be incorporated with the search space for query execution plans, which is a proven optimization strategy introduced by
System R. Fourth, it provides a general framework of query optimization and may potentially be used to optimize next-generation
database systems.
Received September 1, 1993 / Accepted January 8, 1996 相似文献
11.
在连续的数据流上提供查询的应答对很多应用环境来说是一个极为重要的需求。本文主要探索了如何使用有限的内存在数据流上进行聚集SQL查询,以获得近似的结果。使用随机草图技术,计算非常小的数据流草图,以获得泉集查询的近似结果,并保证误差能在一定的范围之内。并讨论了.在草图方法中如何利用已有的直方图统计信息来提高应答的质量。其关键的思想就是对属性域进行智能化的划分,分解草图化问题,确保所获得查询的结果具有合适的近似精度。不论从理论还是实验上都可以证明草图提供的聚集查询结果比传统的直方图更有效、更精确。 相似文献
12.
An increasing number of emerging web database applications deal with large georeferenced data sets. However, exploring these
large data sets through spatial queries can be very time and resource intensive. The need for interactive spatial queries
has arisen in many applications such as Geographic Information Systems (GIS) for efficient decision-support. In this paper,
we propose a new interactive spatial query processing technique for GIS. We present a family of the Incremental Refining Spatial Join (IRSJ) algorithms that can be used to report incrementally refined running estimates for aggregate queries while simultaneously
displaying the actual query result tuples of the data sets sampled so far. Our goal is to minimize the time until an acceptably
accurate estimate of the query result is available (to users) measured by a confidence interval. Our approach enables more
interactive data exploration and analysis. While similar work has been done in relational databases, to the best of our knowledge,
this is the first work using this approach in GIS. We investigate and evaluate different sampling methodologies through extensive
experimental performance comparisons. Experiments on both real and synthetic data show an order of magnitude response time
improvement relative to the final answer obtained when using a full R-tree join. We also show the impact of different index
structures on the performance of our algorithms using three known sampling methods. 相似文献
13.
R. Braumandl M. Keidl A. Kemper D. Kossmann A. Kreutz S. Seltzsam K. Stocker 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(1):48-71
We present the design of ObjectGlobe, a distributed and open query processor for Internet data sources. Today, data is published
on the Internet via Web servers which have, if at all, very localized query processing capabilities. The goal of the ObjectGlobe
project is to establish an open marketplace in which data and query processing capabilities can be distributed and used by any kind of Internet application. Furthermore, ObjectGlobe integrates cycle providers (i.e., machines) which carry out query processing operators. The overall picture is to make it possible to execute a query
with – in principle – unrelated query operators, cycle providers, and data sources. Such an infrastructure can serve as enabling
technology for scalable e-commerce applications, e.g., B2B and B2C market places, to be able to integrate data and data processing
operations of a large number of participants. One of the main challenges in the design of such an open system is to ensure
privacy and security. We discuss the ObjectGlobe security requirements, show how basic components such as the optimizer and
runtime system need to be extended, and present the results of performance experiments that assess the additional cost for
secure distributed query processing. Another challenge is quality of service management so that users can constrain the costs
and running times of their queries.
Received: 30 October 2000 / Accepted: 14 March 2001 Published online: 7 June 2001 相似文献
14.
R. Braumandl J. Claussen A. Kemper D. Kossmann 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):156-177
Inter-object references are one of the key concepts of object-relational and object-oriented database systems. In this work,
we investigate alternative techniques to implement inter-object references and make the best use of them in query processing,
i.e., in evaluating functional joins. We will give a comprehensive overview and performance evaluation of all known techniques
for simple (single-valued) as well as multi-valued functional joins. Furthermore, we will describe special order-preserving\/ functional-join techniques that are particularly attractive for decision support queries that require ordered results. While
most of the presentation of this paper is focused on object-relational and object-oriented database systems, some of the results
can also be applied to plain relational databases because index nested-loop joins\/ along key/foreign-key relationships, as they are frequently found in relational databases, are just one particular way to
execute a functional join.
Received February 28, 1999 / Accepted September 27, 1999 相似文献
15.
Approximate query mapping: Accounting for translation closeness 总被引:2,自引:0,他引:2
Kevin Chen-Chuan Chang Héctor García-Molina 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):155-181
In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information
sources. Achieving the best translation is challenging because sources support different constraints for formulating queries,
and often these constraints cannot be precisely translated. For instance, a query [score>8] might be “perfectly” translated
as [rating>0.8] at some site, but can only be approximated as [grade=A] at another. Unlike other work, our general framework
adopts a customizable “closeness” metric for the translation that combines both precision and recall. Our results show that
for query translation we need to handle interdependencies among both query conjuncts as well as disjuncts. As the basis, we
identify the essential requirements of a rule system for users to encode the mappings for atomic semantic units. Our algorithm
then translates complex queries by rewriting them in terms of the semantic units. We show that, under practical assumptions,
our algorithm generates the best approximate translations with respect to the closeness metric of choice. We also present
a case study to show how our technique may be applied in practice.
Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001 相似文献
16.
17.
Sudipto Guha Hyoungmin Park Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1079-1099
Synopses structures and approximate query answering have become increasingly important in DSS/ OLAP applications with stringent
response time requirements. Range queries are an important class of problems in this domain, and have a wide variety of applications
and have been studied in the context of histograms. However, wavelets have been shown to be quite useful in several scenarios
and in fact their multi-resolution structure makes them especially appealing for hierarchical domains. Furthermore the fact
that the Haar wavelet basis has a linear time algorithm for the computation of coefficients has made the Haar basis one of
the important and widely used synopsis structures. Very recently optimal algorithms were proposed for the wavelet synopsis
construction problem for equality/point queries. In this paper we investigate the problem of optimum Haar wavelet synopsis
construction for range queries with workloads. We provide optimum algorithms as well as approximation heuristics and demonstrate
the effectiveness of these algorithms with our extensive experimental evaluation using synthetic and real-life data sets.
Research was supported in part by the Alfred P. Sloan Research Fellowship and NSF awards CCF-0430376, CCF-0644119.
Research was supported by the Ministry of Information and Communication, Korea, under the College Information Technology Research
Center Support Program, grant number IITA-2006-C1090-0603-0031. 相似文献
18.
Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling. 相似文献
19.
In recent years wavelets were shown to be effective data synopses. We are concerned with the problem of finding efficiently wavelet synopses for massive data sets, in situations where information about query workload is available. We present linear time, I/O optimal algorithms for building optimal workload-based wavelet synopses for point queries. The synopses are based on a novel construction of weighted inner products and use weighted wavelets that are adapted to those products. The synopses are optimal in the sense that the subset of retained coefficients is the best possible for the bases in use with respect to either the mean-squared absolute or relative errors. For the latter, this is the first optimal wavelet synopsis even for the regular, non-workload-based case. Experimental results demonstrate the advantage obtained by the new optimal wavelet synopses. 相似文献
20.
Query by video clip 总被引:15,自引:0,他引:15
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries
that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features
around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained
with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar
to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features
of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and
a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one
basketball video as query and a different basketball video as the database show the effectiveness of feature representation
and matching schemes. 相似文献