首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
The streaming evaluation is a popular way of evaluating queries on XML documents. Besides its many advantages, it is also the only option for a number of important XML applications. Unfortunately, existing algorithms focus almost exclusively on tree-pattern queries (TPQs). Requirements for flexible querying of XML data have motivated recently the introduction of query languages that are more general and flexible than TPQs. These languages are not supported by existing algorithms. In this paper, we consider a partial tree-pattern query (PTPQ) language which generalizes and strictly contains TPQs. PTPQs can express a fragment of XPath which comprises reverse axes and the node identity equality (is) operator, in addition to forward axes, wildcards and predicates. They constitute an important subclass of XPath, which is very useful in practice. Unfortunately, previous streaming algorithms for TPQs cannot be applied to PTPQs. PTPQs can be represented as dags enhanced with constraints. We explore this representation to design an original polynomial time streaming algorithm for PTPQs. Our algorithm aggressively filters incoming data that is irrelevant to the query and wisely avoids processing redundant query matches (i.e., matches of the query dag that do not contribute to new solutions). Our algorithm is the first one to support the streaming evaluation of such a broad fragment of XPath. We provide an analysis of it, and conduct an extensive experimental evaluation of its performance and scalability. Compared to the only known streaming algorithm that supports TPQs extended with reverse axes, our algorithm performs better by orders of magnitude while consuming a much smaller fraction of memory space. Current streaming applications have stringent requirements on query response time and memory consumption because of the large (possibly unbounded) size of data they handle. In order to keep memory usage and CPU consumption low for the PTPQ streaming evaluation, we design another streaming algorithm called Eager PSX for PTPQs. Its key feature is that it applies an eager evaluation strategy to quickly determine when node matches should be returned as solutions to the user and also to proactively detect redundant matches. We theoretically analyze Eager PSX, and experimentally test its time and space performance and scalability. We compare it with PSX. Our results show that Eager PSX not only achieves better space performance without compromising time performance, but it also greatly improves query response time for both simple and complex queries, in many cases, by orders of magnitude.  相似文献   

2.
谷峪  于晓楠  于戈 《软件学报》2014,25(8):1806-1816
随着智能移动设备和无线定位技术的飞速发展,使用基于位置服务应用的用户越来越多.特别地,不同于传统的针对固定位置的快照查询,移动的用户往往基于移动轨迹发出连续的查询.在真实和虚拟的空间环境中,障碍物的影响都是广泛存在的,障碍空间内的查询处理技术得到了越来越多的关注,其中,障碍空间内的连续反k近邻查询处理有着重要的应用.对障碍空间中的连续反k近邻查询问题进行了定义和系统的研究,通过定义控制点和分割点,提出了针对该问题的处理框架.进一步地,提出了一系列的过滤和求精算法,包括剪枝数据集、获取障碍物、剪枝和计算控制点和更新结果集等处理策略.基于多种数据集对所提出的算法进行了实验评估.与针对每个数据点进行k 近邻计算的基本方法相比,这些方法可以大幅度提高查询处理的CPU 和I/O 效率.  相似文献   

3.
SPARQL graph pattern rewriting for OWL-DL inference queries   总被引:1,自引:1,他引:0  
This paper focuses on the issue of OWL-DL ontology queries implemented in SPARQL. Currently, ontology repositories construct inference ontology models, and match SPARQL queries to the models, to derive inference results. Because an inference model uses much more storage space than the original model, and cannot be reused as inference requirements vary, this method is not suitable for large-scale deployment. To solve this problem, this paper proposes a novel method that passes rewritten SPARQL queries to the original ontology model, to retrieve inference results. We define OWL-DL inference rules and apply them to rewriting Graph Patterns in queries. The paper classifies the inference rules and discusses how these rules affect query rewriting. To illustrate the advantages of our proposal, we present a prototype system based on Jena, and address query optimization, to eliminate the disadvantages of augmented query sentences. We perform a set of query tests and compare the results with related works. The results show that the proposed method results in significantly improved query efficiency, without compromising completeness or soundness.
Doo-Kwon BaikEmail:
  相似文献   

4.
Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.  相似文献   

5.
XMin: Minimizing Tree Pattern Queries with Minimality Guarantee   总被引:1,自引:0,他引:1  
Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method.  相似文献   

6.
In this paper, we study the problem of continuous monitoring of reverse k nearest neighbors queries in Euclidean space as well as in spatial networks. Existing techniques are sensitive toward objects and queries movement. For example, the results of a query are to be recomputed whenever the query changes its location. We present a framework for continuous reverse k nearest neighbor (RkNN) queries by assigning each object and query with a safe region such that the expensive recomputation is not required as long as the query and objects remain in their respective safe regions. This significantly improves the computation cost. As a byproduct, our framework also reduces the communication cost in client–server architectures because an object does not report its location to the server unless it leaves its safe region or the server sends a location update request. We also conduct a rigid cost analysis for our Euclidean space RkNN algorithm. We show that our techniques can also be applied to answer bichromatic RkNN queries in Euclidean space as well as in spatial networks. Furthermore, we show that our techniques can be extended for the spatial networks that are represented by directed graphs. The extensive experiments demonstrate that our techniques outperform the existing techniques by an order of magnitude in terms of computation cost and communication cost.  相似文献   

7.
Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries. Recommended by: Clement Yu  相似文献   

8.
可伸缩的增量连续k近邻查询处理   总被引:7,自引:0,他引:7  
廖巍  熊伟  王钧  景宁  钟志农 《软件学报》2007,18(2):268-278
针对基于TPR树(time-parameterized R-tree)索引的大量并发CKNN(continuous k-nearest neighbor)查询处理,提出了一种可伸缩的增量连续k近邻查询处理(scalable processing of incremental continuous k-nearest neighbor queries,简称SI-CNN)框架,通过引入搜索区域进行预裁剪以减少查询更新所需要的TPR树节点访问代价,并引入了增量结果表以保存候选对象,批量地更新查询结果集,具有良好的可伸缩性.基于SI-CNN框架提出了一种增量更新的SI-CNN查询处理算法,能够基于上次查询结果增量的更新查询,支持查询集合中加入或删除查询和移动对象数据集的插入、删除等动态更新操作.实验结果与分析表明,基于SI-CNN框架的SI-CNN算法可以很好地支持大量并发的CKNN查询处理,具有良好的实用价值.  相似文献   

9.
LBS 中连续查询攻击算法及匿名性度量   总被引:6,自引:1,他引:5  
林欣  李善平  杨朝晖 《软件学报》2009,20(4):1058-1068
k-匿名机制是LBS(location based service)中保证查询隐私性的重要手段.已有文献指出,现有的k-匿名机 制不能有效保护连续性查询的隐私性.提出一种连续查询发送模型,该模型融合了查询发送时间的间隔模型和连续性模型,针对此模型下的两种k-匿名算法Clique Cloaking 和Non-clique Cloaking,分别提出了一种连续查询攻击算 法.在此攻击算法下,匿名集的势不再适合作为查询匿名性的度量,因此提出一种基于熵理论的度量方式AD(anonymity degree).实验结果表明,对连续性很强的查询,攻击算法重识别用户身份的成功率极高;AD 比匿名集的势更能反映查询的匿名性.  相似文献   

10.
Nearest and reverse nearest neighbor queries for moving objects   总被引:4,自引:0,他引:4  
With the continued proliferation of wireless communications and advances in positioning technologies, algorithms for efficiently answering queries about large populations of moving objects are gaining interest. This paper proposes algorithms for k nearest and reverse k nearest neighbor queries on the current and anticipated future positions of points moving continuously in the plane. The former type of query returns k objects nearest to a query object for each time point during a time interval, while the latter returns the objects that have a specified query object as one of their k closest neighbors, again for each time point during a time interval. In addition, algorithms for so-called persistent and continuous variants of these queries are provided. The algorithms are based on the indexing of object positions represented as linear functions of time. The results of empirical performance experiments are reported.  相似文献   

11.
A top-N selection query against a relation is to find the N tuples that satisfy the query condition the best but not necessarily completely. In this paper, we propose a new method for evaluating top-N queries against a relation. This method employs a learning-based strategy. Initially, this method finds and saves the optimal search spaces for a small number of random top-N queries. The learned knowledge is then used to evaluate new queries. Extensive experiments are carried out to measure the performance of this strategy and the results indicate that it is highly competitive with existing techniques for both low-dimensional and high-dimensional data. Furthermore, the knowledge base can be updated based on new user queries to reflect new query patterns so that frequently submitted queries can be processed most efficiently. The maintenance and stability of the knowledge base are also addressed in the paper.  相似文献   

12.
This paper presents the scalable on-line execution (SOLE) algorithm for continuous and on-line evaluation of concurrent continuous spatio-temporal queries over data streams. Incoming spatio-temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algorithm utilizes the scarce memory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignificant. SOLE is a scalable algorithm where all the continuous outstanding queries share the same buffer pool. In addition, SOLE is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries. Performance experiments based on a real implementation of SOLE inside a prototype of a data stream management system show the scalability and efficiency of SOLE in highly dynamic environments. This work was supported in part by the National Science Foundation under Grants IIS-0093116, IIS-0209120, and 0010044-CCR.  相似文献   

13.
Continuous reverse k nearest neighbor (CRkNN) monitoring in road networks has recently received increasing attentions. However, there is still a lack of efficient CRkNN algorithms in road networks up to now. In road networks, moving query objects and data objects are restricted by the connectivity of the road network and both the object–query distance and object–object distance updates affect the result of CRkNN queries. In this paper, we present a novel algorithm for continuous and incremental evaluation of CRkNN queries in road networks. Our method is based on a novel data structure called dual layer multiway tree (DLM tree) we proposed to represent the whole monitoring region of a CRkNN query q. We propose several lemmas to reduce the monitoring region of q and the number of candidate objects as much as possible. Moreover, by associating a variable NN_count with each candidate object, we can simplify the monitoring of candidate objects. There are a large number of objects roaming in a road network and many of them are irrelevant to a specific CRkNN query of a query object q. To minimize the processing extension, for a road in the network, we give an IQL list and an IQCL list to specify the set of query objects and data objects whose location updates should be maintained for CRkNN processing of query objects. Our CRkNN method consists of two phase: the initial result generating phase and incremental maintenance phase. In each phase, algorithms with high performance are proposed to make our CRkNN method more efficient. Extensive simulation experiments are conducted and the result shows that our proposed approach is efficient and scalable in processing CRkNN queries in road networks.  相似文献   

14.
Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings.  相似文献   

15.
This article presents a novel type of queries in spatial databases, called the direction-aware bichromatic reverse k nearest neighbor(DBRkNN) queries, which extend the bichromatic reverse nearest neighbor queries. Given two disjoint sets, P and S, of spatial objects, and a query object q in S, the DBRkNN query returns a subset P′ of P such that k nearest neighbors of each object in P′ include q and each object in P′ has a direction toward q within a pre-defined distance. We formally define the DBRkNN query, and then propose an efficient algorithm, called DART, for processing the DBRkNN query. Our method utilizes a grid-based index to cluster the spatial objects, and the B+-tree to index the direction angle. We adopt a filter-refinement framework that is widely used in many algorithms for reverse nearest neighbor queries. In the filtering step, DART eliminates all the objects that are away from the query object more than a pre-defined distance, or have an invalid direction angle. In the refinement step, remaining objects are verified whether the query object is actually one of the k nearest neighbors of them. As a major extension of DART, we also present an improved algorithm, called DART+, for DBRkNN queries. From extensive experiments with several datasets, we show that DART outperforms an R-tree-based naive algorithm in both indexing time and query processing time. In addition, our extension algorithm, DART+, also shows significantly better performance than DART.  相似文献   

16.
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000  相似文献   

17.
It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query.  相似文献   

18.
移动对象连续k近邻(CKNN)查询是指给定一个连续移动的对象集合,对于任意一个k近邻查询q,实时计算查询qk近邻并在查询有效时间内对查询结果进行实时更新.现实生活中,交通出行、社交网络、电子商务等领域许多基于位置的应用服务都涉及移动对象连续k近邻查询这一基础问题.已有研究工作解决连续k近邻查询问题时,大多需要通过多次迭代确定一个包含k近邻的查询范围,而每次迭代需要根据移动对象的位置计算当前查询范围内移动对象的数量,整个迭代过程的计算代价占查询代价的很大部分.为此,提出了一种基于网络索引和混合高斯函数移动对象分布密度的双重索引结构(grid GMM index,GGI),并设计了移动对象连续k近邻增量查询算法(incremental search for continuous k nearest neighbors,IS-CKNN).GGI索引结构的底层采用网格索引对海量移动对象进行维护,上层构建混合高斯模型模拟移动对象在二维空间中的分布.对于给定的k近邻查询q,IS-CKNN算法能够基于混合高斯模型直接确定一个包含qk近邻的查询区域,减少了已有算法求解该区域的多次迭代过程;当移动对象和查询q位置发生变化时,进一步提出一种高效的增量查询策略,能够最大限度地利用已有查询结果减少当前查询的计算量.最后,在滴滴成都网约车数据集以及两个模拟数据集上进行大量实验,充分验证了算法的性能.  相似文献   

19.
Many database applications and environments, such as mediation over heterogeneous database sources and data warehousing for decision support, lead to complex queries. Queries are often nested, defined over previously defined views, and may involve unions. There are good reasons why one might want to remove pieces (sub-queries or sub-views) from such queries: some sub-views of a query may be effectively cached from previous queries, or may be materialized views; some may be known to evaluate empty, by reasoning over the integrity constraints; and some may match protected queries, which for security cannot be evaluated for all users.In this paper, we present a new evaluation strategy with respect to queries defined over views, which we call tuple-tagging, that allows for an efficient removal of sub-views from the query. Other approaches to this are to rewrite the query so the sub-views to be removed are effectively gone, then to evaluate the rewritten query. With the tuple tagging evaluation, no rewrite of the original query is necessary.We describe formally a discounted query (a query with sub-views marked that are to be considered as removed), present the tuple tagging algorithm for evaluating discounted queries, provide an analysis of the algorithm's performance, and present some experimental results. These results strongly support the tuple-tagging algorithm both as an efficient means to effectively remove sub-views from a view query during evaluation, and as a viable optimization strategy for certain applications. The experiments also suggest that rewrite techniques for this may perform worse than the evaluation of the original query, and much worse than the tuple tagging approach.  相似文献   

20.
Various techniques have been developed for different query types in content-based image retrieval systems such as sampling queries, constrained sampling queries, multiple constrained sampling queries, k-NN queries, constrained k-NN queries, and multiple localized k-NN queries. In this paper, we propose a generalized query model suitable for expressing queries of different types, and investigate efficient processing techniques for this new framework. We exploit sequential access and data sharing by developing new storage and query processing techniques to leverage inter-query concurrency. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time in a multiuser environment, and achieve better retrieval precision and recall compared to two recent techniques.
Ning YuEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号