首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reducing network traffic in unstructured P2P systems using Top-k queries   总被引:1,自引:0,他引:1  
A major problem of unstructured P2P systems is their heavy network traffic. This is caused mainly by high numbers of query answers, many of which are irrelevant for users. One solution to this problem is to use Top-k queries whereby the user can specify a limited number (k) of the most relevant answers. In this paper, we present FD, a (Fully Distributed) framework for executing Top-k queries in unstructured P2P systems, with the objective of reducing network traffic. FD consists of a family of algorithms that are simple but effective. FD is completely distributed, does not depend on the existence of certain peers, and addresses the volatility of peers during query execution. We validated FD through implementation over a 64-node cluster and simulation using the BRITE topology generator and SimJava. Our performance evaluation shows that FD can achieve major performance gains in terms of communication and response time. Recommended by: Sunil Prabhakar Work partially funded by the ARA Massive Data of the Agence Nationale de la Recherche.  相似文献   

2.
Nowadays, as the mobile services become widely used, there is a strong demand for mobile support in P2P search techniques. In this paper, we introduce a new cost model for searching multi-dimensional data in mobile P2P environment and propose a novel multi-dimensional mobile P2P search framework called MIME. MIME models the physical node layout in a two-dimensional plane and keeps records of the locations of the nodes to construct a proximity-aware P2P overlay. MIME is able to employ two different split schemes for the construction of the overlay. We propose query processing techniques for such P2P overlay. In addition, we employ a novel expanding method for tuning the performance of KNN queries in MIME. We also discuss two adaptive features incorporated into MIME to support mobility: an update algorithm that makes dynamic updates to the overlay, and a cache mechanism that reduces the load of data migration during the updates. The experimental results show that the proposed techniques are effective, and that MIME achieves significant performance improvements in Point, Range, and KNN queries compared to the conventional system.  相似文献   

3.
Enabling flexible queries with guarantees in P2P systems   总被引:2,自引:0,他引:2  
The Squid peer-to-peer information discovery system supports flexible queries using partial keywords, wildcards, and ranges. It is built on a structured overlay and uses data lookup protocols to guarantee that all existing data elements that match a query are found efficiently. Its main innovation is a dimension-reducing indexing scheme that effectively maps multidimensional information space to physical peers.  相似文献   

4.
随着地理信息系统GIS(Geography Markup Language)的广泛应用,GML己经成为事实上的空间数据编码、传输、共享和发布的一种国际标准。近年来,大量GML数据以文档形式出现,对如何高效地存储和管理GML数据提出了新的挑战。根据GML文档的特点,提出了一种基于模式映射的存储方法。首先,根据映射规则将GML模式生成对象-关系数据库模式;其次,解析GML文档并根据映射信息构造相应的SQL语句,将数据存储于数据库。实验表明,提出的存储方法是可行、高效的。  相似文献   

5.
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170.  相似文献   

6.
In inverted file systems, queries can be written as Boolean expressions of inverted attributes. In response to a query, the system accesses address lists associated with the attributes in the query, merges them, and selects those records that satisfy the search logic. In this paper we consider the minimization of the CPU time needed for the merging operation. The time can possibly be reduced by taking address lists that occur in several product terms as a common factor of these products. This means that the union operation must be performed before the intersection operation. We present formulas which can be used to decide whether the above method is advantageous. The time can also be reduced by choosing the order of intersection operations so that it takes into consideration the occurrences of the address lists in the products and the lengths of the address lists. For choosing the order of intersection operations we give a heuristic algorithm that minimizes the total time needed for intersections.  相似文献   

7.
为了满足CIMS环境中信息集成的要求,本文为信息集成平台设计了一种具有集成功能的面向对象视图模型I-VIEW.I-VIEW对OO模型进行了扩充,定义了虚属性、虚对象的概念;引入了输入与隐藏机制和类派生机制,允许对对象的状态和行为进行提炼,能够很好地解决各类集成问题,如模式映射、评义冲突和模式合并与重构等。  相似文献   

8.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.  相似文献   

9.
Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries. Recommended by: Clement Yu  相似文献   

10.
由于缺乏足够的语义信息,不同模式的XML数据之间很难进行互操作。针对油气井工程中的XML数据集成需求,借助领域全局本体,提出一种模式无关的XML语义集成方法。该方法首先在XML Path路径与领域本体之间进行语义映射,屏蔽其模式差异;然后,按照模型映射方法将XML存储为关系数据;最后通过查询重写将SPARQL转换为SQL语句,实现语义查询。该方法对XML模式进行语义标注,利用关系数据库存储与查询XML数据,能有效处理领域XML数据的语义集成。  相似文献   

11.
RRSi: indexing XML data for proximity twig queries   总被引:2,自引:2,他引:0  
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

12.
There are many advanced techniques that can efficiently mine frequent itemsets using a minimum-support. However, the question that remains unanswered is whether the minimum-support can really help decision makers to make decisions. In this paper, we study four summary queries for frequent itemsets mining, namely, (1) finding a support-average of itemsets, (2) finding a support-quantile of itemsets, (3) finding the number of itemsets that greater/less than the support-average, i.e., an approximated distribution of itemsets, and (4) finding the relative frequency of an itemset (compared its frequency with that of other itemsets in the same dataset). With these queries, a decision maker will know whether an itemset in question is greater/less than the support-quantile; the distribution of itemsets; and the frequentness of an itemset. Processing these summary queries is challenging, because the minimum-support constraint cannot be used to prune infrequent itemsets. In this paper, we propose several simple yet effective approximation solutions. We conduct extensive experiments for evaluating our strategy, and illustrate that the proposed approaches can well model and capture the statistical parameters (summary queries) of itemsets in a database.  相似文献   

13.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments.  相似文献   

14.
一种P2P环境下基于查询日志的查询路由策略   总被引:1,自引:0,他引:1  
在P2P文件共享系统里,通过文件描述可以描述文件的集合并且可以用它来决定查询的路由。但在这种模型下,会出现虚假匹配的现象,从而导致网络带宽和计算资源的浪费。文章提出了一种基于查询命中日志方法来较精确地划分网络结点文件描述的策略来减小虚假同现的发生,来提高查询路由的效率。  相似文献   

15.
This paper studies the problem of answering aggregation queries, satisfying the interval validity semantics, in a distributed system prone to continuous arrival and departure of participants. The interval validity semantics states that the query answer must be calculated considering contributions of at least all processes that remained in the distributed system for the whole query duration. Satisfying this semantics in systems experiencing unbounded churn is impossible due to the lack of connectivity and path stability between processes. This paper presents a novel architecture, namely Virtual Tree, for building and maintaining a structured overlay network with guaranteed connectivity and path stability in settings characterized by bounded churn rate. The architecture includes a simple query answering algorithm that provides interval valid answers. The overlay network generated by the Virtual Tree architecture is a tree-shaped topology with virtual nodes constituted by clusters of processes and virtual links constituted by multiple communication links connecting processes located in adjacent virtual nodes. We formally prove a bound on the churn rate for interval valid queries in a distributed system where communication latencies are bounded by a constant unknown by processes. Finally, we carry out an extensive experimental evaluation that shows the degree of robustness of the overlay network generated by the virtual tree architecture under different churn rates.  相似文献   

16.
The objective of this study is to develop a knowledge-base framework for generatingcooperative answers to indirect queries. Anindirect query can be considered as a nonstandard database query in which a user did not specify explicitly the information request. In a cooperative query answering system, a user's indirect query should be answered with an informative response, either anaffirmative response or anegative response, which is generated on the basis of the inference of the user's information request and the reformulation of the users' indirect query.This paper presents methods for inferring users' intended actions, determining users' information requirements, and for automatically reformulating indirect queries into direct queries. The inference process is carried out on the basis of a user model, calluser action model, as well as the query context. Two kinds ofinformative responses, i.e.affirmative responses andnegative responses can be generated by arule-based approach.  相似文献   

17.
In location-based services, a density query returns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the existing methods try very hard to improve the accuracy of query results, but ignore query efficiency. However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically maintained. Two indexing structures are also used to index candidate regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.  相似文献   

18.
With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations.  相似文献   

19.
分析和研究了传统的分布式数据库连接查询优化算法,利用数据划分和并行处理执行策略,提出了多连接属性划分的查询优化算法.实验证明,此算法可以提高查询的响应速度,减少查询的响应时间,在处理分布式数据 '库中海量信息查询和复杂查询方面具有实用价值.  相似文献   

20.
John H. M. De Vet 《Software》1989,19(5):491-504
This paper describes an algorithm for evaluating database queries represented as expressions in a logical language. Such a database query expression can be evaluated efficiently by focusing on the variable dependencies. The algorithm recursively computes the values of subexpressions to evaluate the input expression, but it avoids re-evaluation of those subexpressions whose values are not affected by new variable assignments. The input expression is internally structured as a directed acyclic graph. Two additional techniques to improve efficiency of the evaluation are discussed: transformations of the input expression and special primitive database operations. Finally, its implementation in the natural language question-answering system SPICOS is described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号