首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 968 毫秒
1.
When a query is posed on a centralized database, if it refers to attributes that are not defined in the database, the user is warranted to get either an error or an empty set. In contrast, when a query is posed on a peer in a P2P system and refers to attributes not found in the local database, the query should not be simply rejected if the relevant information is available at other peers. This paper proposes a query model for unstructured P2P systems to answer such queries. (a) We introduce a class of polymorphic queries, a revision of conjunctive queries by incorporating type variables to accommodate attributes not defined in the local database. (b) We define the semantics of polymorphic queries in terms of horizontal and vertical object expansions, to find attributes and tuples, respectively, missing from the local database. We show that both expansions can be conducted in a uniform framework. (c) We develop a top-K algorithm to approximately answer polymorphic queries. (d) We also provide a method to merge tuples collected from various peers, based on matching keys specified in polymorphic queries. Our experimental study verifies that polymorphic queries are able to find more sensible information than traditional queries supported by P2P systems, and that these queries can be evaluated efficiently.  相似文献   

2.
Traditional database search uses pattern match in the comparison process. For a query with some search words, tuples are selected only if the words of the tuples exactly match the query words. In this paper, we propose a new method for evaluating relational ranking queries (or top-N queries) with text attributes. This method defines semantic distance functions and utilizes semantic match between words in database search. The attempt is that tuples, not only exactly matching, but also close to the query according to semantic distances, can both be fetched. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. Extensive experiments are carried out to measure the performance of this new strategy for the evaluation of ranking queries over relational databases.  相似文献   

3.
An increasing number of emerging web database applications deal with large georeferenced data sets. However, exploring these large data sets through spatial queries can be very time and resource intensive. The need for interactive spatial queries has arisen in many applications such as Geographic Information Systems (GIS) for efficient decision-support. In this paper, we propose a new interactive spatial query processing technique for GIS. We present a family of the Incremental Refining Spatial Join (IRSJ) algorithms that can be used to report incrementally refined running estimates for aggregate queries while simultaneously displaying the actual query result tuples of the data sets sampled so far. Our goal is to minimize the time until an acceptably accurate estimate of the query result is available (to users) measured by a confidence interval. Our approach enables more interactive data exploration and analysis. While similar work has been done in relational databases, to the best of our knowledge, this is the first work using this approach in GIS. We investigate and evaluate different sampling methodologies through extensive experimental performance comparisons. Experiments on both real and synthetic data show an order of magnitude response time improvement relative to the final answer obtained when using a full R-tree join. We also show the impact of different index structures on the performance of our algorithms using three known sampling methods.  相似文献   

4.
Consider a database consisting of a set of tuples, each of which contains an interval, a type and a weight. These tuples are called typed intervals and used to support applications involving diverse intervals. In this paper, we study top-k queries on typed intervals. The query reports k intervals intersecting the query time, containing a particular type and having the largest weight. The query time can be a point or an interval. Further, we define top-k continuous queries that return qualified intervals at each time point during the query interval. To efficiently answer such queries, a key challenge is to build an index structure to manage typed intervals. Employing the standard interval tree, we build the structure in a compact way to reduce the I/O cost, and provide analytically derived partitioning methods to manage the data. Query algorithms are proposed to support point, interval and continuous queries. An auxiliary main-memory structure is developed to report continuous results. Using large real and synthetic datasets, extensive experiments are performed in a prototype database system to demonstrate the effectiveness, efficiency and scalability. The results show that our method significantly outperforms alternative methods in most settings.  相似文献   

5.
Controlled query evaluation for logic-oriented information systems provides a model for the dynamic enforcement of confidentiality policies in scenarios where users are able to reason about a priori knowledge and the answers to previous queries. Previous foundational work assumes that the control mechanism can solve the arising implication problems and deals only with closed queries. In this paper, we overcome these limitations by refining the abstract model for appropriately represented relational databases. We identify a relational submodel where all instances share a fixed infinite Herbrand domain but have finite base relations, and we require finite and domain-independent query results. Then, via suitable syntactic restrictions on the policy and query languages, each occurring implication problem can be equivalently expressed as a universal validity problem within the Bernays-Schönfinkel class, whose (known) decidability in the classical setting is extended to our framework. For refusal and lying, we design and verify evaluation methods for open queries, exploiting controlled query evaluation of appropriate sequences of closed queries, which include answer completeness tests. Additionally, we present alternative evaluation methods that work for lying and the combined approach but at the price of potentially reduced cooperativeness.  相似文献   

6.
Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings.  相似文献   

7.
Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query revaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations  相似文献   

8.
目前大多数P2P系统只提供文件的共享,缺乏数据管理能力.基于关系数据库上的关键搜索,本文提出了一种在P2P环境下共享数据库的新框架,其中每个节点上的数据库被看成是一个文档集,用户不用考虑数据库的模式结构信念,简化了不同节点数据库模式间的映射过程,能更好地适应P2P的分散和动态特性.将基于直方图的分层Top-k查询算法扩展到P2P环境下的数据库管理系统上,文档集和数据库的查询被统一起来,一致对待.在查询处理期间,直方图可以自动更新,同时根据查询结果,邻居节点可以自调整,具有自适应性.实验结果表明,基于关键词的数据库共享突破了传统的数据库共享模式,简化了数据访问方式,而基于直方图的Top-k查询算法提高了查询效率.  相似文献   

9.
Skyline and top-k queries are two popular operations for preference retrieval. In practice, applications that require these operations usually provide numerous candidate attributes, whereas, depending on their interests, users may issue queries regarding different subsets of the dimensions. The existing algorithms are inadequate for subspace skyline/top-k search because they have at least one of the following defects: 1) they require scanning the entire database at least once, 2) they are optimized for one subspace but incur significant overhead for other subspaces, or 3) they demand expensive maintenance cost or space consumption. In this paper, we propose a technique SUBSKY, which settles both types of queries by using purely relational technologies. The core of SUBSKY is a transformation that converts multidimensional data to one-dimensional (1D) values. These values are indexed by a simple B-tree, which allows us to answer subspace queries by accessing a fraction of the database. SUBSKY entails low maintenance overhead, which equals the cost of updating a traditional B-tree. Extensive experiments with real data confirm that our technique outperforms alternative solutions significantly in both efficiency and scalability.  相似文献   

10.
Semijoin is a relational operator used in many relational query processing algorithms. Semijoins can be used to “reduce” the database by delimitting portions of the database that contain data relevant to a given query. For some queries, there exist sequences of semijoins that delimit the exact portions of the database needed to answer the query. Such sequences are called full reducers.

This paper considers a class of queries called natural inequality queries (NI queries), and characterizes a subclass for which full reducers exist. We also present an efficient algorithm that decides whether an NI query lies within this subclass, and constructs a full reducer for the query. The NI queries are a subset of the aggregate-free, conjunctive queries of QUEL, and permit join clauses to include <, , =, , >.  相似文献   


11.
This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP #P -complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries.  相似文献   

12.
In recent years, the availability of complex data repositories (e.g., multimedia, genomic, semistructured databases) has paved the way to new potentials as to data querying. In this scenario, similarity and fuzzy techniques have proven to be successful principles for effective data retrieval. However, most proposals are domain specific and lack of a general and integrated approach to deal with generalized complex queries, i.e., queries where multiple conditions are expressed, possibly on complex as well as on traditional data. To overcome such limitations, much work has been devoted to the development of middleware systems to support query processing on multiple repositories. On a similar line, We present a formal framework to permeate complex similarity and fuzzy queries within a relational database system. As an example, we focus on multimedia data, which is represented in an integrated view with common database data. We have designed an application layer that relies on an algebraic query language, extended with MM-tailored operators, and that maps complex similarity and fuzzy queries to standard SQL statements that can be processed by a relational database system, exploiting standard facilities of modern extensible RDBMS. To show the applicability of our proposal, we implemented a prototype that provides the user with rich query capabilities, ranging from traditional database queries to complex queries gathering a mixture of Boolean, similarity, and fuzzy predicates on the data.  相似文献   

13.
Evaluating refined queries in top-k retrieval systems   总被引:2,自引:0,他引:2  
In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. We explore alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan.  相似文献   

14.
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In the literature, recent research has been elaborated upon minimizing the local error metric of an individual stream. However, many emergent applications, such as stock marketing and sensor detection, also call for the need of recording multiple streams in a commercial DSMS. As shown in our thorough analysis and experimental studies, minimizing global error in multiple-stream environments leads to good reliability for DSMS to answer the queries; in contrast, only minimizing local error may lead to significant loss of query accuracy. As such, we first study in this paper the problem of maintaining the wavelet coefficients of multiple streams within collective memory so that the predetermined global error metric is minimized. Moreover, we also examine a promising application in the multistream environment, i.e., the queries for top-k range sum. We resolve the problem of efficient top-k query processing with minimized global error by developing a general framework. For the purposes of maintaining the wavelet coefficients and processing top-k queries, several well-designed algorithms are utilized to optimize the performance of each primary component of this general framework. We also evaluate the proposed algorithms empirically on real and simulated data streams and show that our framework can process top-k queries accurately and efficiently.  相似文献   

15.
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a data stream management system (DSMS). In the literature, recent research has been elaborated upon minimizing the local error metric of an individual stream. However, many emergent applications such as stock marketing and sensor detection also call for the need of recording multiple streams in a commercial DSMS. As shown in our thorough analysis and experimental studies, minimizing global error in multiple-stream environments leads to good reliability for DSMS to answer the queries. In contrast, only minimizing local error may lead to a significant loss of query accuracy. As such, we first study in this paper the problem of maintaining the wavelet coefficients of multiple streams within collective memory so that the predetermined global error metric is minimized. Moreover, we also examine a promising application in the multistream environment, that is, the queries for top-k range sum. We resolve the problem of efficient top-k query processing with minimized global error by developing a general framework. For the purposes of maintaining the wavelet coefficients and processing top-k queries, several well- designed algorithms are utilized to optimize the performance of each primary component of this general framework. We also evaluate the proposed algorithms empirically on real and simulated data streams and show that our framework can process top-k queries accurately and efficiently.  相似文献   

16.
Much work has been accomplished in the past on the subject of parallel query processing and optimization in parallel relational database systems; however, little work on the same subject has been done in parallel object-oriented database systems. Since the object-oriented view of a database and its processing are quite different from those of a relational system, it can be expected that techniques of parallel query processing and optimization for the latter can be different from the former. In this paper, we present a general framework for parallel object-oriented database systems and several implemented query processing and optimization strategies together with some performance evaluation results. In this work, multiwavefront algorithms are used in query processing to allow a higher degree of parallelism than the traditional tree-based query processing. Four optimization strategies, which are designed specifically for the multiwavefront algorithms and for the optimization of single as well as multiple queries, are introduced. The query processing algorithms and optimization strategies have been implemented on a parallel computer, nCUBE2; and the results of a performance evaluation are presented in this paper. The main emphases and the intended contributions of this paper are (1) data partitioning, query processing and optimization strategies suitable for parallel OODBMSs, (2) the implementation of the multiwavefront algorithms and optimization strategies, and (3) the performance evaluation results.  相似文献   

17.
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of duplicate tuples is required. One of the major problems that arises as part of advanced features of query optimization, data integration, query reformulation and many other research topics is testing for containment of such queries. In this work, we investigate the complexity of query containment problem for CQs under bag semantics (i.e. duplicate tuples are allowed in both the database and the results of queries) and under bag-set semantics (i.e. duplicates are allowed in the result of the queries but not in the database). We derive complexity results for these problems for five major subclasses of CQs; and we also find necessary conditions for CQ query containment. The general case of these problems remains open.  相似文献   

18.
top-k查询在分布式环境中引起越来越多的关注,但是现存的一些top-k算法大都只适用于集中式网络.提出了一个解决分布式网络中top-k查询的新方法—Histogram-Container算法(简称为HC算法),它不仅网络延迟小,网络带宽花费少,而且能够运行在任何结构的分布式网络中.本文将基于一个树型拓扑网络来说明如何使用本地的直方图和bloom filter信息来优化查询,以及如何在中间节点进行部分结果的合并.实验评估和性能分析表明HC算法在网络带宽消耗和查询响应时间方面要优于其他同类方法.  相似文献   

19.
提出了一种新的算法,来解决在分布式的环境中top-k求解问题(求出全局数值最大的前k名)。之前的研究,例如TA、TPUT、HT算法,都会消耗大量的带宽。KLEE算法虽然能够大大地减少带宽的消耗,却不能给出精确解。而提出的算法FT由于添加了一个预处理阶段并且使用了histogram bloom技术,即能有效地减少带宽的消耗,又能给出精确解。实现了FT和相关的算法,并进行了全面的比较。比较是建立在真实的数据集和根据不同情况合成的数据集的基础上的。实验结果显示FT在带宽消耗上面,相对于其他算法有很大的改进和优势。  相似文献   

20.
基于不确定数据的查询处理综述   总被引:5,自引:0,他引:5  
崔斌  卢阳 《计算机应用》2008,28(11):2729-2731
不确定数据在一些重要应用领域中是固有存在的,如传感器网络和移动物体追踪。在不确定数据上使用传统的查询方法会使查询结果出现偏差,不能满足用户的需求。因此,基于不确定数据的查询处理受到了越来越多的关注。与在确定数据上查询不同,不确定数据上的研究工作将概率引入到数据模型中来衡量不确定对象成为结果集中元素的可能性。由于问题定义和数据模型的不同,不确定数据上的查询类型也多种多样。从问题定义、数据模型、剪枝策略和算法等角度,对基于不确定数据的范围查询、top-k查询以及skyline查询进行了介绍。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号