期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Approximating query answering on RDF databases 总被引：1，自引：0，他引：1

Hai Huang Chengfei Liu Xiaofang Zhou 《World Wide Web》2012,15(1):89-114

Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems in efficient query relaxation. First, to ensure the quality of answers, we compute the similarities between relaxed queries with regard to the user query and use them to score the potential relevant answers. Second, for obtaining top-k answers, we develop two algorithms. One is based on the best-first strategy and relaxed queries are executed in the ranking order. The batch based algorithm executes the relaxed queries as a batch and avoids unnecessary execution cost. At last, we implement and experimentally evaluate our approaches. 相似文献

2.

Efficient processing of top-k dominating queries in distributed environments

Daichi Amagata Yuya Sasaki Takahiro Hara Shojiro Nishio 《World Wide Web》2016,19(4):545-577

Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods. 相似文献

3.

Efficient schemes for similarity-aware refinement of aggregation queries

Abdullah?M.?Albarrak Email author Mohamed?A.?Sharaf 《World Wide Web》2017,20(6):1237-1267

Interactive data exploration platforms in Web, business and scientific domains are becoming increasingly popular. Typically, users without prior knowledge of data interact with these platforms in an exploratory manner hoping they might retrieve the results they are looking for. One way to explore large-volume data is by posing aggregate queries which group values of multiple rows by an aggregate operator to form a single value: an aggregated value. Though, when a query fails, i.e., returns undesired aggregated value, users will have to undertake a frustrating trial-and-error process to refine their queries, until a desired result is attained. This data exploration process, however, is growing rather difficult as the underlying data is typically of large-volume and high-dimensionality. While heuristic-based techniques are fairly successful in generating refined queries that meet specified requirements on the aggregated values, they are rather oblivious to the (dis)similarity between the input query and its corresponding refined version. Meanwhile, enforcing a similarity-aware query refinement is rather a non-trivial challenge, as it requires a careful examination of the query space while maintaining a low processing cost. To address this challenge, we propose an innovative scheme for efficient Similarity-Aware Refinement of Aggregation Queries called (EAGER) which aims to balance the tradeoff between satisfying the aggregate and similarity constraints imposed on the refined query to maximize its overall benefit to the user. To achieve that goal, EAGER implements efficient strategies to minimize the costs incurred in exploring the available search space by utilizing similarity-based and monotonic-based pruning techniques to bound the search space and quickly find a refined query that meets users’ expectations. Our extensive experiments show the scalability exhibited by EAGER under various workload settings, and the significant benefits it provides. 相似文献

4.

Progressive evaluation of nested aggregate queries

Kian-Lee Tan Cheng Hian Goh Beng Chin Ooi 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):261-278

In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000 相似文献

5.

Optimizing top-k selection queries over multimedia repositories 总被引：2，自引：0，他引：2

Chaudhuri S. Gravano L. Marian A. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(8):992-1009

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically, request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted. 相似文献

6.

Filtering Data Streams for Entity-Based Continuous Queries

Cheng Reynold Kao Ben Kwan Alan Prabhakar Sunil Tu Yicheng 《Knowledge and Data Engineering, IEEE Transactions on》2010,22(2):234-248

The idea of allowing query users to relax their correctness requirements in order to improve performance of a data stream management system (e.g., location-based services and sensor networks) has been recently studied. By exploiting the maximum error (or tolerance) allowed in query answers, algorithms for reducing the use of system resources have been developed. In most of these works, however, query tolerance is expressed as a numerical value, which may be difficult to specify. We observe that in many situations, users may not be concerned with the actual value of an answer, but rather which object satisfies a query (e.g., "who is my nearest neighbor?”). In particular, an entity-based query returns only the names of objects that satisfy the query. For these queries, it is possible to specify a tolerance that is "nonvalue-based.” In this paper, we study fraction-based tolerance, a type of nonvalue-based tolerance, where a user specifies the maximum fractions of a query answer that can be false positives and false negatives. We develop fraction-based tolerance for two major classes of entity-based queries: 1) nonrank-based query (e.g., range queries) and 2) rank-based query (e.g., k-nearest-neighbor queries). These definitions provide users with an alternative to specify the maximum tolerance allowed in their answers. We further investigate how these definitions can be exploited in a distributed stream environment. We design adaptive filter algorithms that allow updates be dropped conditionally at the data stream sources without affecting the overall query correctness. Extensive experimental results show that our protocols reduce the use of network and energy resources significantly. 相似文献

7.

不确定性区域上的多维关键词搜索

刘勇杨艳李巍丁鑫哲《计算机工程与应用》2017,53(4):84-89

随着在线地图应用的普及,基于地图的空间对象检索成为一个重要的工具而被广泛使用,技术也比较成熟。人们在地图上经常进行确定性目标点查询,例如用户提交关键词“咖啡店”,地图应用会在地图上标记所有的咖啡店,用户还可以通过进一步操作获取咖啡店的详细信息。但实际生活中存在另一种需求,例如用户想找到一个区域,在这个区域内要有“咖啡店”、“学校”和“旅店”这三类对象,称这样的查询为不确定性区域检索查询。目前对地图应用的研究无法解决不确定性区域检索的问题。而利用矩形剪枝和top-k推荐能够通过用户提交的关键字,给用户返回若干候选区域。相似文献

8.

Answering why-not questions on KNN queries

Zhefan ZHONG Xin LIN Liang HE Jing YANG 《Frontiers of Computer Science》2019,13(5):1062

Being decades of study, the usability of database systems have received more attention in recent years. Now it is especially able to explain missing objects in a query result, which is called “why-not” questions, and is the focus of concern. This paper studies the problem of answering whynot questions on KNN queries. In our real life, many users would like to use KNN queries to investigate the surrounding circumstances. Nevertheless, they often feel disappointed when finding the result not including their expected objects. In this paper, we use the query refinement approach to resolve the problem. Given the original KNN query and a set of missing objects as input, our algorithm offer a refined KNN query that includes the missing objects to the user. The experimental results demonstrate the efficiency of our proposed optimizations and algorithms. 相似文献

9.

Boundary query solution using convex hull algorithm

Hossein Shirgahi Mehran Mohsenzadeh S. Hamid H.S. Javadi 《人工智能实验与理论杂志》2013,25(1):139-146

An important issue in database (DB) systems is responding to different users’ queries in an acceptable time. To do this, we should define different queries based on users’ real needs and we should consider suitable solutions. In this article, we express a new query called ‘boundary query’ which is used for achieving an overall view of a subject in the DB. This query does not return all query answers but it returns boundary values that cover all answers for the related query. In this article, we map a DB environment to a vector space based on necessary attributes. Then we implement the proposed method, and based on the results, we observe that the proposed method's run time is acceptable for huge DBs. 相似文献

10.

Relaxing RDF queries based on user and domain preferences

Peter Dolog Heiner Stuckenschmidt Holger Wache Jörg Diederich 《Journal of Intelligent Information Systems》2009,33(3):239-260

相似文献

11.

Braunmuller B. Ester M. Kriegel H.-P. Sander J. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(1):79-95

Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. We introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach 相似文献

12.

一种针对反向空间偏好top-k查询的高效处理方法

李淼谷峪陈默于戈《软件学报》2017,28(2):310-325

随着地理位置定位技术的蓬勃发展,基于在线位置服务技术的应用也越来越多.提出一种查询类型——反向空间偏好top-k查询.类似于传统的反向空间top-k查询,对于给定的空间查询对象,该查询返回使该对象满足top-k属性得分的那些用户.但不同的是,该对象的属性不是自身具有的特性,而是通过计算该对象与其他偏好对象之间的空间关系（如距离）而确定.这种查询在市场分析等许多重要领域具有需求,例如,根据查询结果,分析出某个地区中某个设施受欢迎的程度.但是,由于大量空间对象的存在导致对象之间空间关系的计算代价非常高,如何实时地计算出对象的空间属性得分,给查询处理带来很大的挑战.针对该问题提出优化的查询处理算法包括：数据集剪枝、数据集批量处理、基于权重的用户分组等策略.通过理论分析和充分的实验验证,证明了所提出方法的有效性.与普通方法相比,这些方法能够大幅度提高查询处理的执行时间和I/O效率. 相似文献

13.

Adaptive query relaxation and top-<Emphasis Type="Italic">k</Emphasis> result ranking over autonomous web databases

Xiangfu Meng Xiaoyan Zhang Yanhuan Tang Chongchun Bi 《Knowledge and Information Systems》2017,51(2):395-433

Internet users may suffer the empty or too little answer problem when they post a strict query to the Web database. To address this problem, we develop a general framework to enable automatically query relaxation and top-k result ranking. Our framework consists of two processing steps. The first step is query relaxation. Based on the user original query, we speculate how much the user cares about each specified attribute by measuring its specified value distribution in the database. The rare distribution of the specified value of the attribute indicates the attribute may important for the user. According to the attribute importance, the original query is then rewritten as a relaxed query by expanding each query criterion range. The relaxed degree on each specified attribute is varied with the attribute weight adaptively. The most important attribute is relaxed with the minimum degree so that the answer returned by the relaxed query can be most relevant to the user original intention. The second step is top-k result ranking. In this step, we first generate user contextual preferences from query history and then use them to create a priori orders of tuples during the off-line pre-processing. Only a few representative orders are saved, each corresponding to a set of contexts. Then, these orders and associated contexts are used at querying time to expeditiously provide top-k relevant answers by using the top-k evaluation algorithm. Results of a preliminary user study demonstrate our query relaxation, and top-k result ranking methods can capture the users preferences effectively. The efficiency and effectiveness of our approach is also demonstrated. 相似文献

14.

The Threshold Algorithm: From Middleware Systems to the Relational Engine

Bruno N. Hui Wang 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):523-537

The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems 相似文献

15.

Exemplar queries: a new way of searching

Davide Mottin Matteo Lissandrini Yannis Velegrakis Themis Palpanas 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):741-765

Modern search engines employ advanced techniques that go beyond the structures that strictly satisfy the query conditions in an effort to better capture the user intentions. In this work, we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested. We call these queries exemplar queries. We provide a formal specification of their semantics and show that they are fundamentally different from notions like queries by example, approximate queries and related queries. We provide an implementation of these semantics for knowledge graphs and present an exact solution with a number of optimizations that improve performance without compromising the result quality. We study two different congruence relations, isomorphism and strong simulation, for identifying the answers to an exemplar query. We also provide an approximate solution that prunes the search space and achieves considerably better time performance with minimal or no impact on effectiveness. The effectiveness and efficiency of these solutions with synthetic and real datasets are experimentally evaluated, and the importance of exemplar queries in practice is illustrated. 相似文献

16.

No-but-semantic-match: computing semantically matched xml keyword search results

Mehdi?Naseriparsa Email author Md.?Saiful?Islam Chengfei?Liu Irene?Moser 《World Wide Web》2018,21(5):1223-1257

Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches. 相似文献

17.

QUBiC: An adaptive approach to query-based recommendation

Lin Li Luo Zhong Zhenglu Yang Masaru Kitsuregawa 《Journal of Intelligent Information Systems》2013,40(3):555-587

Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision. 相似文献

18.

Correcting queries for XML

Sara Cohen Tali Brodianskiy 《Information Systems》2009,34(8):690-710

It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query. 相似文献

19.

Probabilistic nearest neighbor query processing on distributed uncertain data

Daichi Amagata Yuya Sasaki Takahiro Hara Shojiro Nishio 《Distributed and Parallel Databases》2016,34(2):259-287

A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost. 相似文献

20.

Extending the UML concepts to transform natural language queries with fuzzy semantics into SQL

《Information and Software Technology》2006,48(9):901-914

Database applications tend toward getting more versatile and broader to comply with the expansion of various organizations. However, naïve users usually suffer from accessing data arbitrarily by using formal query languages. Therefore, we believe that accessing databases using natural language constructs will become a popular interface in the future. The concept of object-oriented modeling makes the real world to be well represented or expressed in some kinds of logical form. Since the class diagram in UML is used to model the static relationships of databases, in this paper, we intend to study how to extend the UML class diagram representations to capture natural language queries with fuzzy semantics. By referring to the conceptual schema throughout the class diagram representation, we propose a methodology to map natural language constructs into the corresponding class diagram and employ Structured Object Model (SOM) methodology to transform the natural language queries into SQL statements for query executions. Moreover, our approach can handle queries containing vague terms specified in fuzzy modifiers, like ‘good’ or ‘bad’. By our approach, users obtain not only the query answers but also the corresponding degree of vagueness, which can be regarded as the same way we are thinking. 相似文献