期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cache-Based Aggregate Query Shipping: An Efficient Scheme of Distributed OLAP Query Processing

Hua-Ming Liao Guo-Shun Pei 《计算机科学技术学报》2008,23(6):905-915

Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine(DQPE) . Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping(RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage,RAQS is no more favorable. Therefore a global cost estimation model is employed to get proper operators:RR Agg,R Agg,or R Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing,a multi-level hybrid view caching(HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution,evaluations with distributed TPC-H queries show significant improvement on average response time. 相似文献

2.

k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks

下载免费PDF全文

Hyeong-Il Kim Jae-Woo Chang 《计算机科学技术学报》2013,28(4):585-596

Recent development of wireless communication technologies and the popularity of smart phones are making location-based services (LBS) popular. However, requesting queries to LBS servers with users’ exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an effcient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To effciently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time. 相似文献

3.

Personalized query suggestion diversification in information retrieval

Wanyu CHEN Fei CAI Honghui CHEN Maarten DE RIJKE 《Frontiers of Computer Science》2020,14(3):143602-141

Query suggestions help users refine their queries after they input an initial query.Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based,developing models that either focus on adapting to a specific user(personalization)or on diversifying query aspects in order to maximize the probability of the user being satisfied(diversification).We consider the task of generating query suggestions that are both personalized and diversified.We propose a personalized query suggestion diversification(PQSD)model,where a user's long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user's search context in their current session.Query aspects are identified through clicked documents based on the open directory project(ODP)with a latent dirichlet allocation(LDA)topic model.We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online(AOL)query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification.The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries,especially when more query suggestions are returned in the list. 相似文献

4.

面向不确定数据的概率障碍k聚集最近邻查询

《计算机科学与探索》2018,(2):231-240

针对现有方法无法有效处理不确定数据的障碍k聚集最近邻查询问题的不足,提出了基于不确定Voronoi图的概率障碍k聚集最近邻查询(probabilistic obstacle k aggregate nearest neighbor query,POk ANN)方法。该方法分为3个阶段,分别是查询点集处理阶段、过滤阶段和精炼阶段。在处理阶段,计算查询点集的最小覆盖圆圆心q,为剪枝做准备。过滤阶段针对3种聚集函数设计了不同的过滤算法,去除不可能成为结果的数据点进而得到候选集合。精炼阶段将候选集合中概率值大于给定阈值的k个数据点集合存入结果集合并返回给用户。理论研究和实验表明,所提出的方法在概率障碍k聚集最近邻查询方面有明显的优势。相似文献

5.

A logistic regression model for Semantic Web service matchmaking

WEI DengPing WANG Ting & WANG Ji 《中国科学:信息科学(英文版)》2012,(7):1715-1720

Semantic Web service matchmaking,as one of the most challenging problems in Semantic Web services (SWS),aims to filter and rank a set of services with respect to a service query by using a certain matching strategy.In this paper,we propose a logistic regression based method to aggregate several matching strategies instead of a fixed integration (e.g.,the weighted sum) for SWS matchmaking.The logistic regression model is trained on training data derived from binary relevance assessments of existing test collections,and then used to predict the probability of relevance between a new pair of query and service according to their matching values obtained from various matching strategies.Services are then ranked according to the probabilities of relevance with respect to each query.Our method is evaluated on two main test collections,SAWSDL-TC2 and Jena Geography Dataset(JGD).Experimental results show that the logistic regression model can effectively predict the relevance between a query and a service,and hence can improve the effectiveness of service matchmaking. 相似文献

6.

A query index for continuous queries on RFID streaming data

Jaekwan Park Bonghee Hong Chaehoon Ban 《中国科学F辑(英文版)》2008,51(12):2047-2061

RFID middleware collects and filters RFID streaming data to process applications＇ requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification （ECSpec） model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets. 相似文献

7.

A Semantic Cache Framework for Secure XML Queries

下载免费PDF全文

Jian-Hua Feng Guo-Liang Li and Na Ta 《计算机科学技术学报》2008,23(6):988-997

Secure XML query answering to protect data privacy and semantic cache to speed up XML query answering are two hot spots in current research areas of XML database systems. While both issues are explored respectively in depth,they have not been studied together,that is,the problem of semantic cache for secure XML query answering has not been addressed yet. In this paper,we present an interesting joint of these two aspects and propose an efficient framework of semantic cache for secure XML query answering,which can improve the performance of XML database systems under secure circumstances. Our framework combines access control,user privilege management over XML data and the state-of-the-art semantic XML query cache techniques,to ensure that data are presented only to authorized users in an efficient way. To the best of our knowledge,the approach we propose here is among the first beneficial efforts in a novel perspective of combining caching and security for XML database to improve system performance. The efficiency of our framework is verified by comprehensive experiments. 相似文献

8.

Cooperative Answering of Fuzzy Queries

下载免费PDF全文

Hanène Chettaoui 《计算机科学技术学报》2009,24(4):675-686

The majority of existing information systems deals with crisp data through crisp database systems.Traditional Database Management Systems(DBMS) have not taken into account imprecision so one can say there is some sort of lack of flexibility.The reason is that queries retrieve only elements which precisely match to the given Boolean query.That is,an element belongs to the result if the query is true for this element;otherwise,no answers are returned to the user.The aim of this paper is to present a cooper... 相似文献

9.

Accomplishing Deterministic XML Query Optimization 总被引：1，自引：1，他引：0

下载免费PDF全文

Dun-Ren Che 《计算机科学技术学报》2005,20(3):357-366

As the popularity of XML (extensible Markup Language) keeps growing rapidly, the management of XML compliant structured-document databases has become a very interesting and compelling research area. Query optimization for XML structured-documents stands out as one of the most challenging research issues in this area because of the much enlarged optimization (search) space, which is a consequence of the intrinsic complexity of the underlying data model of XML data. We therefore propose to apply deterministic transformations on query expressions to most aggressively prune the search space and fast achieve a sufficiently improved alternative (if not the optimal) for each incoming query expression. This idea is not just exciting but practically attainable. This paper first provides an overview of our optimization strategy, and then focuses on the key implementation issues of our rule-based transformation system for XML query optimization in a database environment. The performance results we obtained from experimentation show that our approach is a valid and effective one. 相似文献

10.

Fast filtering false active subspaces for efficient high dimensional similarity processing

GuoRen Wang Ge Yu JunChang Xin YuHai Zhao EnDe Zhang 《中国科学F辑(英文版)》2009,52(2):286-294

The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre-sponding to the request. However,some active query subspaces may contain no query results at all,those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active sub-spaces increases as the dimensionality increases. In order to solve this problem,this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be re-fined by filtering within its mapped space. To do so,a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy,an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally,the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set. 相似文献

11.

View creation for queries in object oriented databases

下载免费PDF全文

Rajesh Narang K.D. Sharma 《计算机科学技术学报》1999,14(4):349-362

A view in object oriented databases corresponds to virtual schema with restructured generalization and decomposition hierarchies,Numbers of view creation methodologies have been proposed.A major drawback of existing methodologies is that they do not maintain the closure property.That is,the result of a query does not have the same semantics as embodied in the object oriented data model.Teherefore,this paper presents a view creation methodology that derives a class in response to a user‘s query,integrates derived class in global schema (i.e., considers the problem of classes moving in class hierarchy)and selects the required classes from global schema to create the view for user‘s query.Novel idea of view creation includes:(a) an object algebra for class derivation and customization(where the derived classes in terms of object instances and procedure/methods are studied),(b)maintenance of closure property,and (c)classification algorithm which provides mechanism to deal with the problem of a class moving in a class hierarchy. 相似文献

12.

HEDC＋＋： An Extended Histogram Estimator for Data in the Cloud

下载免费PDF全文

史英杰孟小峰 Pusheng Wang 干艳桃《计算机科学技术学报》2013,28(6):973-988

With increasing popularity of cloud-based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional databases to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plans, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct an exact histogram on massive data, building an approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC＋＋, an extended histogram estimator for data in the cloud, which provides efficient approximation approaches for both equi-width and equi-depth histograms. We design the histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC＋＋ can provide promising histogram estimate for massive data in the cloud. 相似文献

13.

Continually Answering Constraint k-NN Queries in Unstructured P2P Systems

下载免费PDF全文

Bin Wang Xiao-Chun Yang Guo-Ren Wang Ge Yu Lei Chen X. Sean Wang Xue-Min Lin 《计算机科学技术学报》2008,23(4)

We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system,in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers.Such queries are based on local filter query statistics,and require as less communication cost as possible,which makes it more difficult than the existing distributed k-NN queries.Especially,we hope to reduce candidate peers and degrade communication cost.In this paper,we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries.Our approach is especially suitable for continuous k-NN queries when updating peers,including changing ranges of peers,dynamically leaving or joining peers,and updating data in a peer. In addition,simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR.)-based query approaches,especially for continuous queries. 相似文献

14.

Yun PENG Yitong XU Huawei ZHAO Zhizheng ZHOU Huimin HAN 《Frontiers of Computer Science》2020,14(3):143601-128

This paper studies the most similar maximal clique query(MSMCQ).Given a graph G and a set of nodes Q,MSMCQ is to find the maximal clique of G having the largest similarity with Q.MSMCQ has many real applications including advertising industry,public security,task crowdsourcing and social network,etc.MSMCQ can be studied as a special case of the general set similarity query(SSQ).However,the MCs of G has several specialties from the general sets.Based on the specialties of MCs,we propose a novel index,namely MCIndex.MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time.Specifically,we first construct an inverted indexⅠfor all the MCs of G.Since the MCs in a posting list often have a lot of overlaps,MCIndex selects some pivots to cluster the MCs with a small radius.Given a query Q,we compute the distance from the pivots to Q.The clusters of the pivots assured not answer can be pruned by our distance based pruning rule.Since it is NP-hard to construct a minimum MCIndex,we propose to construct a minimal MCIndex onⅠ(v)with an approximation ratio 1+ln|Ⅰ(v)|.Since the MCs have properties that are inherent of graph structure,we further propose a S Index within each cluster of a MCIndex and a structure based pruning rule.S Index can significantly reduce the number of candidates.Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation,we also propose a binary representation of MCs to improve the efficiency of the intersection size computation.Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets. 相似文献

15.

Compressed Data Cube for Approximate OLAP Query Processing 总被引：4，自引：0，他引：4

下载免费PDF全文

冯玉王珊《计算机科学技术学报》2002,17(5):0-0

Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling. 相似文献

16.

Threshold-Based Shortest Path Query over Large Correlated Uncertain Graphs

下载免费PDF全文

成雨蓉袁野陈雷王国仁《计算机科学技术学报》2015,(4)

With the popularity of uncertain data, queries over uncertain graphs have become a hot topic in the database community. As one of the important queries, the shortest path query over an uncertain graph ... 相似文献

17.

Power system aggregate load area modelling by particle swarm optimization

Jian-Lin Wei Ji-Hong Wang Q. H. Wu Nan Lu Department of Electrical Engineering Electronics University of Liverpool Brownlow Hill Liverpool L GJ UK 《国际自动化与计算杂志》2005,2(2):171-178

This paper presents a new approach for deriving a power system aggregate load area model (ALAM). In this approach, an equivalent area load model is derived to represent the load characters for a particular area load of a power system network. The Particle Swarm Optimization (PSO) method is employed to identify the unknown parameters of the generalised system, ALAM, based on the system measurement directly using a one-step scheme. Simulation studies are carried out for an IEEE 14-Bus power system and an IEEE 57-Bus power system. Simulation results show that the ALAM can represent the area load characters accurately under different operational conditions and at different power system states. 相似文献

18.

Querying Big Data: Bridging Theory and Practice

下载免费PDF全文

樊文飞 ;怀进鹏《计算机科学技术学报》2014,29(5):849-869

Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are ＂tractable＂ on big data？ How can we make big data ＂small＂ so that it is feasible to find exact query answers？When exact answers are beyond reach in practice, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers？ To get sensible query answers in big data,what else do we necessarily do in addition to coping with the size of the data？ This position paper aims to provide an overview of recent advances in the study of querying big data. We propose approaches to tackling these challenging issues,and identify open problems for future research. 相似文献

19.

k-dominant Skyline query algorithm for dynamic datasets

Zhiyun ZHENG Ke RUAN Mengyao YU Xingjin ZHANG Ning WANG Dun LI 《Frontiers of Computer Science》2021,15(1):151602-221

At present,most k-dominant Skyline query algorithms are oriented to static datasets,this paper proposes a k-dominant Skyline query algorithm for dynamic datasets.The algorithm is recursive circularly.First,we compute the dominant ability of each object and sort objects in descending order by dominant ability.Then,we maintain an inverted index of the dominant index by k-dominant Skyline point calculation algorithm.When the data changes,it is judged whether the update point will afect the k dominant Skyline point set.So the k-dominant Skyline point of the new data set is obtained by inserting and deleting algorithm.The proposed algorithm resolves maintenance isue of a frequently updated database by dynamically updating the data sets.The experimental results show that the query algorithm can efectively improve query eficiency. 相似文献

20.

Efficient Incremental Maintenance for Distributive and Non-Distributive Aggregate Functions

下载免费PDF全文

Cui-Ping Li Shan Wang 《计算机科学技术学报》2006,21(1):52-65

Data cube pre-computation is an important concept for supporting OLAP （Online Analytical Processing） and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases. 相似文献