期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Progressive evaluation of nested aggregate queries

Kian-Lee Tan Cheng Hian Goh Beng Chin Ooi 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):261-278

In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000 相似文献

2.

Evaluating refined queries in top-k retrieval systems 总被引：2，自引：0，他引：2

Kaushik Chakrabarti Ortega-Binderberger M. Mehrotra S. Porkaew K. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(2):256-270

In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. We explore alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan. 相似文献

3.

FLEX: a tolerant and cooperative user interface to databases 总被引：2，自引：0，他引：2

Motro A. 《Knowledge and Data Engineering, IEEE Transactions on》1990,2(2):231-246

FLEX a user interface to relational databases, can be used satisfactorily by users with different levels of expertise. FLEX is based on a formal query language, but is tolerant of incorrect input. It never rejects queries; instead, it adapts flexibility and transparently to their level of correctness and well-formedness, providing interpretations of corresponding accuracy and specificity. The most prominent design feature of FLEX is the smooth concatenation of several independent mechanisms, each capable of handling input of decreasing level of correctness and well-formedness. Each input is cascaded through this series of mechanisms until an interpretation is found. FLEX is also cooperative. It never delivers empty answers without explanation or assistance. By following up each failed query with a set of more general queries, FLEX determines whether an empty answer is genuine, in which case it suggests related queries that have nonempty answers, or whether it reflects erroneous presuppositions on the part of the user, in which case it then explains them 相似文献

4.

A framework for corroborating answers from multiple web sources

Minji Wu Amélie Marian 《Information Systems》2011

Search engines are increasingly efficient at identifying the best sources for any given keyword query, and are often able to identify the answer within the sources. Unfortunately, many web sources are not trustworthy, because of erroneous, misleading, biased, or outdated information. In many cases, users are not satisfied with the results from any single source. In this paper, we propose a framework to aggregate query results from different sources in order to save users the hassle of individually checking query-related web sites to corroborate answers. To return the best answers to the users, we assign a score to each individual answer by taking into account the number, relevance and originality of the sources reporting the answer, as well as the prominence of the answer within the sources, and aggregate the scores of similar answers. We conducted extensive qualitative and quantitative experiments of our corroboration techniques on queries extracted from the TREC Question Answering track and from a log of real web search engine queries. Our results show that taking into account the quality of web pages and answers extracted from the pages in a corroborative way results in the identification of a correct answer for a majority of queries. 相似文献

5.

Continuous aggregate nearest neighbor queries 总被引：1，自引：0，他引：1

Hicham G. Elmongui Mohamed F. Mokbel Walid G. Aref 《GeoInformatica》2013,17(1):63-95

This paper addresses the problem of continuous aggregate nearest-neighbor (CANN) queries for moving objects in spatio-temporal data stream management systems. A CANN query specifies a set of landmarks, an integer k, and an aggregate distance function f (e.g., min, max, or sum), where f computes the aggregate distance between a moving object and each of the landmarks. The answer to this continuous query is the set of k moving objects that have the smallest aggregate distance f. A CANN query may also be viewed as a combined set of nearest neighbor queries. We introduce several algorithms to continuously and incrementally answer CANN queries. Extensive experimentation shows that the proposed operators outperform the state-of-the-art algorithms by up to a factor of 3 and incur low memory overhead. 相似文献

6.

A theory of translation from relational queries to hierarchicalqueries

Weiyi Meng Yu C. Won Kim 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(2):228-245

In a heterogeneous database system, a query for one type of database system (i.e., a source query) may have to be translated to an equivalent query (or queries) for execution in a different type of database system (i.e., a target query). Usually, for a given source query, there is more than one possible target query translation. Some of them can be executed more efficiently than others by the receiving database system. Developing a translation procedure for each type of database system is time-consuming and expensive. We abstract a generic hierarchical database system (GHDBS) which has properties common to database systems whose schema contains hierarchical structures (e.g., System 2000, IMS, and some object-oriented database systems). We develop principles of query translation with GHDBS as the receiving database system. Translation into any specific system can be accomplished by a translation into the general system with refinements to reflect the characteristics of the specific system. We develop rules that guarantee correctness of the target queries, where correctness means that the target query is equivalent to the source query. We also provide rules that can guarantee a minimum number of target queries in cases when one source query needs to be translated to multiple target queries. Since the minimum number of target queries implies the minimum number of times the underlying system is invoked, efficiency is taken into consideration 相似文献

7.

Evaluation of probabilistic queries over imprecise data in constantly-evolving environments

Reynold Cheng Dmitri V. Kalashnikov Sunil Prabhakar 《Information Systems》2007

Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies. 相似文献

8.

Correcting queries for XML

Sara Cohen Tali Brodianskiy 《Information Systems》2009,34(8):690-710

It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query. 相似文献

9.

Approximation trade-offs in a Markovian stream warehouse: An empirical study

《Information Systems》2014

A large amount of the world's data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these imprecise streams are difficult to query efficiently because of their rich semantics and large volumes, forcing applications to sacrifice either performance or accuracy. There exists little work, however, that characterizes this trade-off space and helps applications make an appropriate choice.In this paper, we study the effects – on both efficiency and accuracy – of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models.We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries—we develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries. 相似文献

10.

面向PSTP查询的高效处理算法

下载免费PDF全文

周军锋李义国郭景峰《计算机科学与探索》2010,4(11):1039-1048

在使用"不完全结构的约束查询(PSTP查询)"从XML文档中获取信息时,用户可以根据自身对XML文档结构的熟悉程度,在查询表达式中灵活地嵌入结构约束条件,从而满足完全不了解、完全了解及了解部分结构信息的各种用户的查询需求。提出一种基于扩展Dewey编码的查询处理算法,可以在仅扫描一遍元素的情况下,处理任意形式的PSTP查询。不同数据集上的实验结果表明,EDPS算法在处理twig查询、不包含"*"结点的PSTP查询及包含"*"结点的PSTP查询时,综合性能明显优于已有方法。相似文献

11.

The price of validity in dynamic networks

《Journal of Computer and System Sciences》2007,73(3):245-264

Massive-scale self-administered networks like Peer-to-Peer and Sensor Networks have data distributed across thousands of participant hosts. These networks are highly dynamic with short-lived hosts being the norm rather than an exception. In recent years, researchers have investigated best-effort algorithms to efficiently process aggregate queries (e.g., sum, count, average, minimum and maximum) on these networks. Unfortunately, query semantics for best-effort algorithms are ill-defined, making it hard to reason about guarantees associated with the result returned. In this paper, we specify a correctness condition, Single-Site Validity, with respect to which the above algorithms are best-effort. We present a class of algorithms that guarantee validity in dynamic networks. Experiments on real-life and synthetic network topologies validate performance of our algorithms, revealing the hitherto unknown price of validity. 相似文献

12.

用于聚集值近似查询的基于密度的树索引结构

许俭吴天轶王晨汪卫施伯乐《计算机科学》2005,32(11):99-103

如何快速有效地对数据立方体上的聚集查询给出近似的回答,是数据挖掘和数据仓库研究领域中的核心问题之一。现有大多数聚集查询算法在同一个数据立方体上只能支持某种特定的而非多种类型的聚集查询。本文给出了一种新的框架AdenTS,即基于密度的自适应树结构,它可以回答同一数据立方体上的各类聚集查询,也提出了一些近似和启发式技术,改善了查询结果和精度。实验结果表明,这种方法在支持的查询种类和性能上是更好的。相似文献

13.

CoBase: A scalable and extensible cooperative information system 总被引：3，自引：0，他引：3

Wesley W. Chu Hua Yang Kuorong Chiang Michael Minock Gladys Chow Chris Larson 《Journal of Intelligent Information Systems》1996,6(2-3):223-259

A new generation of information systems that integrates knowledge base technology with database systems is presented for providing cooperative (approximate, conceptual, and associative) query answering. Based on the database schema and application characteristics, data are organized into Type Abstraction Hierarchies (TAHs). The higher levels of the hierarchy provide a more abstract data representation than the lower levels. Generalization (moving up in the hierarchy), specialization (moving down the hierarchy), and association (moving between hierarchies) are the three key operations in deriving cooperative query answers for the user. Based on the context, the TAHs can be constructed automatically from databases. An intelligent dictionary/directory in the system lists the location and characteristics (e.g., context and user type) of the TAHs. CoBase also has a relaxation manager to provide control for query relaxations. In addition, an explanation system is included to describe the relaxation and association processes and to provide the quality of the relaxed answers. CoBase uses a mediator architecture to provide scalability and extensibility. Each cooperative module, such as relaxation, association, explanation, and TAH management, is implemented as a mediator. Further, an intelligent directory mediator is provided to direct mediator requests to the appropriate service mediators. Mediators communicate with each other via KQML. The GUI includes a map server which allows users to specify queries graphically and incrementally on the map, greatly improving querying capabilities. CoBase has been demonstrated to answer imprecise queries for transportation and logistic planning applications. Currently, we are applying the CoBase methodology to match medical image (X-ray, MRI) features and approximate matching of emitter signals in electronic warfare applications.This work supported by ARPA contract F30602-94-C-0207. 相似文献

14.

Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data

Jinchuan Chen Reynold Cheng Mohamed Mokbel Chi-Yin Chow 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1219-1240

In several emerging and important applications, such as location-based services, sensor monitoring and biological databases, the values of the data items are inherently imprecise. A useful query class for these data is the Probabilistic Nearest-Neighbor Query (PNN), which yields the IDs of objects for being the closest neighbor of a query point, together with the objects’ probability values. Previous studies showed that this query takes a long time to evaluate. To address this problem, we propose the Constrained Nearest-Neighbor Query (C-PNN), which returns the IDs of objects whose probabilities are higher than some threshold, with a given error bound in the answers. We show that the C-PNN can be answered efficiently with verifiers. These are methods that derive the lower and upper bounds of answer probabilities, so that an object can be quickly decided on whether it should be included in the answer. We design five verifiers, which can be used on uncertain data with arbitrary probability density functions. We further develop a partial evaluation technique, so that a user can obtain some answers quickly, without waiting for the whole query evaluation process to be completed (which may incur a high response time). In addition, we examine the maintenance of a long-standing, or continuous C-PNN query. This query requires any update to be applied to the result immediately, in order to reflect the changes to the database values (e.g., due to the change of the location of a moving object). We design an incremental update method based on previous query answers, in order to reduce the amount of I/O and CPU cost in maintaining the correctness of the answers to such a query. Performance evaluation on realistic datasets show that our methods are capable of yielding timely and accurate results. 相似文献

15.

MQSS: multimodal query suggestion and searching for video search

Lusong Li Jing Li 《Multimedia Tools and Applications》2011,54(1):55-68

In this paper, we propose a multimodal query suggestion method for video search which can leverage multimodal processing to improve the quality of search results. When users type general or ambiguous textual queries, our system MQSS provides keyword suggestions and representative image examples in an easy-to-use dropdown manner which can help users specify their search intent more precisely and effortlessly. It is a powerful complement to initial queries. After the queries are formulated as multimodal query (i.e., text, image), the new queries are input to individual search models, such as text-based, concept-based and visual example-based search model. Then we apply multimodal fusion method to aggregate the above-mentioned several search results. The effectiveness of MQSS is demonstrated by evaluations over a web video data set. 相似文献

16.

Approximating query answering on RDF databases 总被引：1，自引：0，他引：1

Hai Huang Chengfei Liu Xiaofang Zhou 《World Wide Web》2012,15(1):89-114

Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems in efficient query relaxation. First, to ensure the quality of answers, we compute the similarities between relaxed queries with regard to the user query and use them to score the potential relevant answers. Second, for obtaining top-k answers, we develop two algorithms. One is based on the best-first strategy and relaxed queries are executed in the ranking order. The batch based algorithm executes the relaxed queries as a batch and avoids unnecessary execution cost. At last, we implement and experimentally evaluate our approaches. 相似文献

17.

Aviv Nisgav Boaz Patt-Shamir 《Theory of Computing Systems》2011,49(4):720-737

We consider a system where users wish to find similar users. To model similarity, we assume the existence of a set of queries, and two users are deemed similar if their answers to these queries are (mostly) identical. Technically, each user has a vector of preferences (answers to queries), and two users are similar if their preference vectors differ in only a few coordinates. The preferences are unknown to the system initially, and the goal of the algorithm is to classify the users into classes of roughly the same preferences by asking each user to answer the least possible number of queries. We prove nearly matching lower and upper bounds on the maximal number of queries required to solve the problem. Specifically, we present an “anytime” algorithm that asks each user at most one query in each round, while maintaining a partition of the users. The quality of the partition improves over time: for n users and time T, groups of [(O)\tilde](n/T)\tilde{O}(n/T) users with the same preferences will be separated (with high probability) if they differ in sufficiently many queries. We present a lower bound that matches the upper bound, up to a constant factor, for nearly all possible distances between user groups. 相似文献

18.

Consistent query answers from virtually integrated XML data

Zijing Tan Chengfei Liu 《Journal of Systems and Software》2010,83(12):2566-2578

When data sources are virtually integrated, there is no common and centralized method to maintain global consistency, so inconsistencies with regard to global integrity constraints are very likely to occur. In this paper, we consider the problem of defining and computing consistent query answers when queries are posed to virtual XML data integration systems, which are specified following the local-as-view approach. We propose a powerful XML constraint model to define global constraints, which can express keys and functional dependencies, and which also extends the newly introduced conditional functional dependencies to XML. We provide an approach to defining XML views, which supports not only edge-path mappings but also data-value bindings to express the join operator. We give formal definitions of repair and consistent query answers with the XML data integration settings. Given a query on the global system, we present a two-step method to compute consistent query answers. First, the given query is transformed using the global constraints, such that to run the transformed query on the original global system will generate exactly the consistent query answers. Because the global instance is not materialized, the query on the global instance is then rewritten in the form of queries on the underlying data sources by reversing rules in view definitions. We illustrate that the XPath query transformations can be implemented in XQuery. Finally, we implement prototypes of our method and evaluate our algorithms in the experiments. 相似文献

19.

Controlled query evaluation with open queries for a decidable relational submodel

Joachim Biskup Piero Bonatti 《Annals of Mathematics and Artificial Intelligence》2007,50(1-2):39-77

Controlled query evaluation for logic-oriented information systems provides a model for the dynamic enforcement of confidentiality policies in scenarios where users are able to reason about a priori knowledge and the answers to previous queries. Previous foundational work assumes that the control mechanism can solve the arising implication problems and deals only with closed queries. In this paper, we overcome these limitations by refining the abstract model for appropriately represented relational databases. We identify a relational submodel where all instances share a fixed infinite Herbrand domain but have finite base relations, and we require finite and domain-independent query results. Then, via suitable syntactic restrictions on the policy and query languages, each occurring implication problem can be equivalently expressed as a universal validity problem within the Bernays-Schönfinkel class, whose (known) decidability in the classical setting is extended to our framework. For refusal and lying, we design and verify evaluation methods for open queries, exploiting controlled query evaluation of appropriate sequences of closed queries, which include answer completeness tests. Additionally, we present alternative evaluation methods that work for lying and the combined approach but at the price of potentially reduced cooperativeness. 相似文献

20.

Selection problems viaM-ary queries

Katia S. Guimarães William I. Gasarch Jim Purtilo 《Computational Complexity》1992,2(3):256-276

It is well known that, for fixedk, to find thek-th largest ofn elementsn+(k?1)log₂ n+Θ(1) comparisons are necessary and sufficient. But do the same bounds apply if we use a different type of query? We show that the arity of the queries is relevant. In particular, we present upper and lower bounds for finding the maximum using 3-ary or 4-ary Boolean (YES/NO answers) queries. We also study general (e.g.,max, sort) 3-ary queries, and show bounds for finding the maximum and the second largest. For sort queries we show matching upper and lower bounds. 相似文献