共查询到20条相似文献,搜索用时 15 毫秒
1.
Reza Akbarinia Esther Pacitti Patrick Valduriez 《Distributed and Parallel Databases》2006,19(2-3):67-86
A major problem of unstructured P2P systems is their heavy network traffic. This is caused mainly by high numbers of query
answers, many of which are irrelevant for users. One solution to this problem is to use Top-k queries whereby the user can specify a limited number (k) of the most relevant answers. In this paper, we present FD, a (Fully Distributed) framework for executing Top-k queries in unstructured P2P systems, with the objective of reducing network traffic. FD consists of a family of algorithms
that are simple but effective. FD is completely distributed, does not depend on the existence of certain peers, and addresses
the volatility of peers during query execution. We validated FD through implementation over a 64-node cluster and simulation
using the BRITE topology generator and SimJava. Our performance evaluation shows that FD can achieve major performance gains
in terms of communication and response time.
Recommended by: Sunil Prabhakar
Work partially funded by the ARA Massive Data of the Agence Nationale de la Recherche. 相似文献
2.
Lidan Shou Xiaolong Zhang Ping Wang Gang Chen Jinxiang Dong 《Information Sciences》2011,181(13):2841-2857
Nowadays, as the mobile services become widely used, there is a strong demand for mobile support in P2P search techniques. In this paper, we introduce a new cost model for searching multi-dimensional data in mobile P2P environment and propose a novel multi-dimensional mobile P2P search framework called MIME. MIME models the physical node layout in a two-dimensional plane and keeps records of the locations of the nodes to construct a proximity-aware P2P overlay. MIME is able to employ two different split schemes for the construction of the overlay. We propose query processing techniques for such P2P overlay. In addition, we employ a novel expanding method for tuning the performance of KNN queries in MIME. We also discuss two adaptive features incorporated into MIME to support mobility: an update algorithm that makes dynamic updates to the overlay, and a cache mechanism that reduces the load of data migration during the updates. The experimental results show that the proposed techniques are effective, and that MIME achieves significant performance improvements in Point, Range, and KNN queries compared to the conventional system. 相似文献
3.
Enabling flexible queries with guarantees in P2P systems 总被引:2,自引:0,他引:2
The Squid peer-to-peer information discovery system supports flexible queries using partial keywords, wildcards, and ranges. It is built on a structured overlay and uses data lookup protocols to guarantee that all existing data elements that match a query are found efficiently. Its main innovation is a dimension-reducing indexing scheme that effectively maps multidimensional information space to physical peers. 相似文献
4.
随着地理信息系统GIS(Geography Markup Language)的广泛应用,GML己经成为事实上的空间数据编码、传输、共享和发布的一种国际标准。近年来,大量GML数据以文档形式出现,对如何高效地存储和管理GML数据提出了新的挑战。根据GML文档的特点,提出了一种基于模式映射的存储方法。首先,根据映射规则将GML模式生成对象-关系数据库模式;其次,解析GML文档并根据映射信息构造相应的SQL语句,将数据存储于数据库。实验表明,提出的存储方法是可行、高效的。 相似文献
5.
Shantanu Joshi Christopher Jermaine 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):181-202
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query,
where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased
estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting
estimate. The two estimators are tested over an extensive set of experiments.
Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170. 相似文献
6.
Anne Putkonen 《International journal of parallel programming》1980,9(5):351-369
In inverted file systems, queries can be written as Boolean expressions of inverted attributes. In response to a query, the system accesses address lists associated with the attributes in the query, merges them, and selects those records that satisfy the search logic. In this paper we consider the minimization of the CPU time needed for the merging operation. The time can possibly be reduced by taking address lists that occur in several product terms as a common factor of these products. This means that the union operation must be performed before the intersection operation. We present formulas which can be used to decide whether the above method is advantageous. The time can also be reduced by choosing the order of intersection operations so that it takes into consideration the occurrences of the address lists in the products and the lengths of the address lists. For choosing the order of intersection operations we give a heuristic algorithm that minimizes the total time needed for intersections. 相似文献
7.
为了满足CIMS环境中信息集成的要求,本文为信息集成平台设计了一种具有集成功能的面向对象视图模型I-VIEW.I-VIEW对OO模型进行了扩充,定义了虚属性、虚对象的概念;引入了输入与隐藏机制和类派生机制,允许对对象的状态和行为进行提炼,能够很好地解决各类集成问题,如模式映射、评义冲突和模式合并与重构等。 相似文献
8.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets. 相似文献
9.
Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries.
Recommended by: Clement Yu 相似文献
10.
11.
RRSi: indexing XML data for proximity twig queries 总被引:2,自引:2,他引:0
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing
is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are
heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore
some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this
paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting
proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show
good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural
indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show
that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
相似文献
Vincent T. Y. NgEmail: |
12.
There are many advanced techniques that can efficiently mine frequent itemsets using a minimum-support. However, the question that remains unanswered is whether the minimum-support can really help decision makers to make decisions. In this paper, we study four summary queries for frequent itemsets mining, namely, (1) finding a support-average of itemsets, (2) finding a support-quantile of itemsets, (3) finding the number of itemsets that greater/less than the support-average, i.e., an approximated distribution of itemsets, and (4) finding the relative frequency of an itemset (compared its frequency with that of other itemsets in the same dataset). With these queries, a decision maker will know whether an itemset in question is greater/less than the support-quantile; the distribution of itemsets; and the frequentness of an itemset. Processing these summary queries is challenging, because the minimum-support constraint cannot be used to prune infrequent itemsets. In this paper, we propose several simple yet effective approximation solutions. We conduct extensive experiments for evaluating our strategy, and illustrate that the proposed approaches can well model and capture the statistical parameters (summary queries) of itemsets in a database. 相似文献
13.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments. 相似文献
14.
一种P2P环境下基于查询日志的查询路由策略 总被引:1,自引:0,他引:1
在P2P文件共享系统里,通过文件描述可以描述文件的集合并且可以用它来决定查询的路由。但在这种模型下,会出现虚假匹配的现象,从而导致网络带宽和计算资源的浪费。文章提出了一种基于查询命中日志方法来较精确地划分网络结点文件描述的策略来减小虚假同现的发生,来提高查询路由的效率。 相似文献
15.
Roberto Baldoni Silvia Bonomi Adriano Cerocchi Leonardo Querzoni 《Journal of Parallel and Distributed Computing》2013
This paper studies the problem of answering aggregation queries, satisfying the interval validity semantics, in a distributed system prone to continuous arrival and departure of participants. The interval validity semantics states that the query answer must be calculated considering contributions of at least all processes that remained in the distributed system for the whole query duration. Satisfying this semantics in systems experiencing unbounded churn is impossible due to the lack of connectivity and path stability between processes. This paper presents a novel architecture, namely Virtual Tree, for building and maintaining a structured overlay network with guaranteed connectivity and path stability in settings characterized by bounded churn rate. The architecture includes a simple query answering algorithm that provides interval valid answers. The overlay network generated by the Virtual Tree architecture is a tree-shaped topology with virtual nodes constituted by clusters of processes and virtual links constituted by multiple communication links connecting processes located in adjacent virtual nodes. We formally prove a bound on the churn rate for interval valid queries in a distributed system where communication latencies are bounded by a constant unknown by processes. Finally, we carry out an extensive experimental evaluation that shows the degree of robustness of the overlay network generated by the virtual tree architecture under different churn rates. 相似文献
16.
The objective of this study is to develop a knowledge-base framework for generatingcooperative answers to indirect queries. Anindirect query can be considered as a nonstandard database query in which a user did not specify explicitly the information request. In a cooperative query answering system, a user's indirect query should be answered with an informative response, either anaffirmative response or anegative response, which is generated on the basis of the inference of the user's information request and the reformulation of the users' indirect query.This paper presents methods for inferring users' intended actions, determining users' information requirements, and for automatically reformulating indirect queries into direct queries. The inference process is carried out on the basis of a user model, calluser action model, as well as the query context. Two kinds ofinformative responses, i.e.affirmative responses andnegative responses can be generated by arule-based approach. 相似文献
17.
In location-based services, a density query returns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the existing methods try very hard to improve the accuracy of query results, but ignore query efficiency. However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically maintained. Two indexing structures are also used to index candidate regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries. 相似文献
18.
With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations. 相似文献
19.
20.
John H. M. De Vet 《Software》1989,19(5):491-504
This paper describes an algorithm for evaluating database queries represented as expressions in a logical language. Such a database query expression can be evaluated efficiently by focusing on the variable dependencies. The algorithm recursively computes the values of subexpressions to evaluate the input expression, but it avoids re-evaluation of those subexpressions whose values are not affected by new variable assignments. The input expression is internally structured as a directed acyclic graph. Two additional techniques to improve efficiency of the evaluation are discussed: transformations of the input expression and special primitive database operations. Finally, its implementation in the natural language question-answering system SPICOS is described. 相似文献