期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Continually Answering Constraint <Emphasis Type="Italic">k</Emphasis>-<Emphasis Type="Italic">NN</Emphasis> Queries in Unstructured P2P Systems

下载免费PDF全文

Bin Wang Xiao-Chun Yang Guo-Ren Wang Ge Yu Lei Chen X. Sean Wang and Xue-Min Lin 《计算机科学技术学报》2008,23(4):538-556

We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer （P2P） system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle （MBR）-based query approaches, especially for continuous queries. 相似文献

2.

Continually Answering Constraint k-NN Queries in Unstructured P2P Systems

下载免费PDF全文

Bin Wang Xiao-Chun Yang Guo-Ren Wang Ge Yu Lei Chen X. Sean Wang Xue-Min Lin 《计算机科学技术学报》2008,23(4)

We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system,in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers.Such queries are based on local filter query statistics,and require as less communication cost as possible,which makes it more difficult than the existing distributed k-NN queries.Especially,we hope to reduce candidate peers and degrade communication cost.In this paper,we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries.Our approach is especially suitable for continuous k-NN queries when updating peers,including changing ranges of peers,dynamically leaving or joining peers,and updating data in a peer. In addition,simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR.)-based query approaches,especially for continuous queries. 相似文献

3.

Data Management in Peer-to-Peer Environment: A Perspective of BestPeer 总被引：1，自引：0，他引：1

下载免费PDF全文

周傲英钱卫宁周水庚凌波徐林昊黄维雄黄铭钧陈建利《计算机科学技术学报》2003,18(4):0-0

Peer-to-Peer (P2P) systems have attracted much attention in academic commu-nity and industry circles due to their promising applications in various domains. This paper presents the authors‘ research efforts on introducing complex query capabilities in a P2P environ-ment consisting of numerous peers with large volume of data. An underlying hybrid P2P computing platform, named BestPeer is described first. The connection among peers within BestPeer is self-configurable through maintaining the nearest neighbor of peers, and the agent techniques employed in the system ensure its capability of providing sophisticated services. The designs of three P2P data management systems which are all based on BestPeer are described in detail. They provide support for information retrieval, query processing and Web services respectively. Advantages and limitations are discussed, while ongoing work is presented. Current systems can provide basic functions for keyword-based search, SQL-like query processing, and Web services querying and discovery. Some further topics on providing fully-fledged data management functionalities for P2P distributed computing systems with security guarantee are also discussed. 相似文献

4.

Compressed Data Cube for Approximate OLAP Query Processing 总被引：4，自引：0，他引：4

下载免费PDF全文

冯玉王珊《计算机科学技术学报》2002,17(5):0-0

Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling. 相似文献

5.

Fast filtering false active subspaces for efficient high dimensional similarity processing

GuoRen Wang Ge Yu JunChang Xin YuHai Zhao EnDe Zhang 《中国科学F辑(英文版)》2009,52(2):286-294

The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre-sponding to the request. However,some active query subspaces may contain no query results at all,those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active sub-spaces increases as the dimensionality increases. In order to solve this problem,this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be re-fined by filtering within its mapped space. To do so,a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy,an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally,the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set. 相似文献

6.

Continuous Probabilistic Subspace Skyline Query Processing Using Grid Projections

下载免费PDF全文

赵雷杨艳艳周晓方《计算机科学技术学报》2014,29(2):332-344

As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes （known as data uncertainty）, to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm. 相似文献

7.

A Framework for Supporting Tree-Like Indexes on the Chord Overlay

下载免费PDF全文

朱命冬申德荣寇月聂铁铮于戈《计算机科学技术学报》2013,28(6):962-972

With the explosive growth of data, to support efficient data management including queries and updates, the database system is expected to provide tree-like indexes, such as R-tree, M-tree, B＋-tree, according to different types of data. In the distributed environment, the indexes have to be scattered across the compute nodes to improve reliability and scalability. Indexes can speed up queries, but they incur maintenance cost when updates occur. In the distributed environment, each compute node maintains a subset of an index tree, so keeping the communication cost small is more crucial, or else it occupies lots of network bandwidth and the scalability and availability of the database system are affected. Further, to achieve the reliability and scalability for queries, several replicas of the index are needed, but keeping the replicas consistent is not straightforward. In this paper, we propose a framework supporting tree-like indexes, based on Chord overlay, which is a popular P2P structure. The framework dynamically tunes the number of replicas of index to balance the query cost and the update cost. Several techniques are designed to improve the efficiency of updates without the cost of performance of the queries. We implement M-tree and R-tree in our framework, and extensive experiments on real- life and synthetic datasets verify the efficiency and scalability of our framework. 相似文献

8.

下载免费PDF全文

周军锋林卓旺鲍芝峰孟小峰《计算机科学技术学报》2012,27(1):195-212

We investigate the limitations of existing XML search methods and propose a new semantics, related relationship, to effectively capture meaningful relationships of data elements from XML data in the absence of structural constraints. Then we make an extension to XPath by introducing a new axis, related axis, to specify the related relationship between query nodes so as to enhance the flexibility of XPath. We propose to reduce the cost of computing the related relationship by a new schema summary that summarizes the related relationship from the original schema without any loss. Based on this schema summary, we introduce two indices to improve the performance of query processing. Our algorithm shows that the evaluation of most queries can be equivalently transformed into just a few selection and value join operations, thus avoids the costly structural join operations. The experimental results show that our method is effective and efficient in terms of comparing the effectiveness of the related relationship with existing keyword search semantics and comparing the efficiency of our evaluation methods with existing query engines. 相似文献

9.

HEDC＋＋： An Extended Histogram Estimator for Data in the Cloud

下载免费PDF全文

史英杰孟小峰 Pusheng Wang 干艳桃《计算机科学技术学报》2013,28(6):973-988

With increasing popularity of cloud-based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional databases to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plans, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct an exact histogram on massive data, building an approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC＋＋, an extended histogram estimator for data in the cloud, which provides efficient approximation approaches for both equi-width and equi-depth histograms. We design the histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC＋＋ can provide promising histogram estimate for massive data in the cloud. 相似文献

10.

PEJA: Progressive Energy-Efficient Join Processing for Sensor Networks

下载免费PDF全文

Yong-Xuan Lai Yi-Long Chen and Hong Chen 《计算机科学技术学报》2008,23(6):957-972

Sensor networks are widely used in many applications to collaboratively collect information from the physical environment. In these applications,the exploration of the relationship and linkage of sensing data within multiple regions can be naturally expressed by joining tuples in these regions. However,the highly distributed and resource-constraint nature of the network makes join a challenging query. In this paper,we address the problem of processing join query among different regions progressively and energy-efficiently in sensor networks. The proposed algorithm PEJA(Progressive Energy-efficient Join Algorithm) adopts an event-driven strategy to output the joining results as soon as possible,and alleviates the storage shortage problem in the in-network nodes. It also installs filters in the joining regions to prune unmatchable tuples in the early processing phase,saving lots of unnecessary transmissions. Extensive experiments on both synthetic and real world data sets indicate that the PEJA scheme outperforms other join algorithms,and it is effective in reducing the number of transmissions and the delay of query results during the join processing. 相似文献

11.

Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

Rao Praveen R. Moon Bongki 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(12):1737-1752

One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query's signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents. 相似文献

12.

非结构化对等计算系统中多维范围搜索

下载免费PDF全文

徐林昊钱卫宁周傲英《软件学报》2007,18(6):1443-1455

对等计算数据管理中的一个重要问题是如何有效地支持多维数据空间上的相似性搜索.现有的非结构化对等计算数据共享系统仅支持简单的查询处理方法,即匹配查询处理.将近似技术和路由索引结合在一起,设计了一种简单、有效的索引结构EVARI(扩展近似向量路由索引).利用EVARI,每个节点不仅可以在本地共享的数据集上处理范围查询,而且还可以将查询转发给最有希望获得查询结果的邻居节点.为了建立EVARI,每个节点使用空间划分技术概括本地的共享内容,并与邻居节点交换概要信息.而且,每个节点都可以重新配置自己的邻居节点,使得相关节点位置相互邻近,优化了系统资源配置,提升了系统性能.仿真实验证明了该方法的良好性能. 相似文献

13.

A pure peer-to-peer approach for kNN query processing in mobile ad hoc networks

Thao P. Nghiem Agustinus Borgy Waluyo David Taniar 《Personal and Ubiquitous Computing》2013,17(5):973-985

Together with advanced positioning and mobile technologies, P2P query processing has attracted a growing interest number of location-aware applications such as answering kNN queries in mobile ad hoc networks. It not only overcomes drawbacks of centralized systems, for example single point of failure and bottleneck issues, but more importantly harnesses power of peers’ collaboration. In this research, we propose a pure mobile P2P query processing scheme which primarily focuses on the search and validation algorithm for kNN queries. The proposed scheme is designed for pure mobile P2P environments with the absence of the base station support. Compared with centralized and hybrid systems, our system can reduce energy consumption more than six times by making use of data sharing from peers in a reasonable mean latency of processing time for networks with high density of moving objects as can be seen in the simulation results. 相似文献

14.

Peer-to-peer bichromatic reverse nearest neighbours in mobile ad-hoc networks

Thao P. Nghiem Kiki Maulana Kinh Nguyen David Green Agustinus Borgy Waluyo David Taniar 《Journal of Parallel and Distributed Computing》2014

The increasing use of mobile communications has raised many issues of decision support and resource allocation. A crucial problem is how to solve queries of Reverse Nearest Neighbour (RNN). An RNN query returns all objects that consider the query object as their nearest neighbour. Existing methods mostly rely on a centralised base station. However, mobile P2P systems offer many benefits, including self-organisation, fault-tolerance and load-balancing. In this study, we propose and evaluate 3 distinct P2P algorithms focusing on bichromatic RNN queries, in which mobile query peers and static objects of interest are of two different categories, based on a time-out mechanism and a boundary polygon around the mobile query peers. The Brute-Force Search Algorithm provides a naive approach to exploit shared information among peers whereas two other Boundary Search Algorithms filter a number of peers involved in query processing. The algorithms are evaluated in the MiXiM simulation framework with both real and synthetic datasets. The results show the practical feasibility of the P2P approach for solving bichromatic RNN queries for mobile networks. 相似文献

15.

Efficient Range Query Processing in Peer-to-Peer Systems

Li Dongsheng Cao Jiannong Lu Xicheng Chen Keith C. C. 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(1):78-91

With the increasing popularity of the peer-to-peer (P2P) computing paradigm, many general range query schemes for distributed hash table (DHT)-based P2P systems have been proposed in recent years. Although those schemes can provide range query capability without modifying the underlying DHTs, they have the query delay depending on both the scale of the system and the size of the query space or the specific query, and thus cannot guarantee to return the query results in a bounded delay. In this paper, we propose Armada, an efficient range query processing scheme to support delay-bounded single-attribute and multiple-attribute range queries. It is the first delay-bounded general range query scheme on constant-degree DHTs, and can return the results for any range query within 2logN hops in a P2P system with N peers. Results of analysis and simulations show that the average delay in Armada is less than logN, and the average message cost of single-attribute range queries is about logN+2n 2 (n is the number of peers that intersect with the query). These results are very close to the lower bounds on delay and message cost of range queries over constant-degree DHTs. 相似文献

16.

Skyframe: a framework for skyline query processing in peer-to-peer systems

Shiyuan Wang Quang Hieu Vu Beng Chin Ooi Anthony K. H. Tung Lizhen Xu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):345-362

This paper looks at the processing of skyline queries on peer-to-peer (P2P) networks. We propose Skyframe, a framework for efficient skyline query processing in P2P systems, which addresses the challenges of quick response time, low network communication cost and query load balancing among peers. Skyframe consists of two querying methods: one is optimized for network communication while the other focuses on query response time. These methods are different in the way in which the query search space is defined. In particular, the first method uses a high dominating point that has a large dominating region to prune the search space to achieve a low cost in network communication. On the other hand, the second method relaxes the search space in order to allow parallel query processing to speed up query response. Skyframe achieves query load balancing by both query load conscious data space splitting/merging during the join/departure of nodes and dynamic load migration. We further show how to apply Skyframe to both the P2P systems supporting multi-dimensional indexing and the P2P systems supporting single-dimensional indexing. Finally, we have conducted extensive experiments on both real and synthetic data sets over two existing P2P systems: CAN (Ratnasamy in A scalable content-addressable network. In: Proceedings of SIGCOMM Conference, pp. 161–172, 2001) and BATON (Jagadish et al. in A balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB Conference, pp. 661–672, 2005) to evaluate the effectiveness and scalability of Skyframe. 相似文献

17.

Top-<Emphasis Type="Italic">k</Emphasis> query processing in mobile-P2P networks using economic incentive schemes

Nilesh Padhariya Anirban Mondal Sanjay Kumar Madria 《Peer-to-Peer Networking and Applications》2016,9(4):731-751

This work proposes the E-Top system for the efficient processing of top-k queries in mobile ad hoc peer to peer (M-P2P) networks using economic incentive schemes. In E-Top, brokers facilitate top-k query processing in lieu of a commission. E-Top issues economic rewards to the mobile peers, which send relevant data items (i.e., those that contribute to the top-k query result), and penalizes peers otherwise, thereby optimizing the communication traffic. Peers use the payoffs (rewards/penalties) as a means of feedback to re-evaluate the scores of their items for re-ranking purposes. The main contributions of E-Top are three-fold. First, it proposes two economic incentive schemes, namely ETK and ETK+, in which peers act individually towards top-k query processing. Second, it extends ETK and ETK+ to propose a peer group-based economic incentive scheme ETG. Third, our performance evaluation shows that our schemes are indeed effective in improving the performance of top-k queries in terms of query response times and accuracy at reasonable communication traffic cost. 相似文献

18.

Polymorphic queries for P2P systems

Jie Liu Wenfei Fan 《Information Systems》2011

When a query is posed on a centralized database, if it refers to attributes that are not defined in the database, the user is warranted to get either an error or an empty set. In contrast, when a query is posed on a peer in a P2P system and refers to attributes not found in the local database, the query should not be simply rejected if the relevant information is available at other peers. This paper proposes a query model for unstructured P2P systems to answer such queries. (a) We introduce a class of polymorphic queries, a revision of conjunctive queries by incorporating type variables to accommodate attributes not defined in the local database. (b) We define the semantics of polymorphic queries in terms of horizontal and vertical object expansions, to find attributes and tuples, respectively, missing from the local database. We show that both expansions can be conducted in a uniform framework. (c) We develop a top-K algorithm to approximately answer polymorphic queries. (d) We also provide a method to merge tuples collected from various peers, based on matching keys specified in polymorphic queries. Our experimental study verifies that polymorphic queries are able to find more sensible information than traditional queries supported by P2P systems, and that these queries can be evaluated efficiently. 相似文献

19.

The Threshold Algorithm: From Middleware Systems to the Relational Engine

Bruno N. Hui Wang 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):523-537

The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems 相似文献