首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers. We do not present, however, a concrete query language. Queries over classical structured data models contain a number of variables and constraints on these variables. An answer is a binding of the variables by elements of the database such that the constraints are satisfied. In the present paper, we loosen this concept in so far as we allow also answers that are partial; that is, not all variables in the query are bound by such an answer. Partial answers make it necessary to refine the model of query evaluation. The first modification relates to the satisfaction of constraints: in some circumstances we consider constraints involving unbound variables as satisfied. Second, in order to prevent a proliferation of answers, we only accept answers that are maximal in the sense that there are no assignments that bind more variables and satisfy the constraints of the query. Our model of query evaluation consists of two phases, a search phase and a filter phase. Semistructured databases are essentially labeled directed graphs. In the search phase, we use a query graph containing variables to match a maximal portion of the database graph. We investigate three different semantics for query graphs, which give rise to three variants of matching. For each variant, we provide algorithms and complexity results. In the filter phase, the maximal matchings resulting from the search phase are subjected to constraints, which may be weak or strong. Strong constraints require all their variables to be bound, while weak constraints do not. We describe a polynomial algorithm for evaluating a special type of queries with filter constraints, and assess the complexity of evaluating other queries for several kinds of constraints. In the final part, we investigate the containment problem for queries consisting only of search constraints under the different semantics.  相似文献   

2.
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost.  相似文献   

3.
In this paper we report on our experience using WebSQL, a high level declarative query language for extracting information from the Web. WebSQL takes advantage of multiple index servers without requiring users to know about them, and integrates full-text with topology-based queries.The WebSQL query engine is a library of Java classes, and WebSQL queries can be embedded into Java programs much in the same way as SQL queries are embedded in C programs. This allows us to access the Web from Java at a much higher level of abstraction than bare HTTP requests.We illustrate the use of WebSQL for application development by describing two applications we are experimenting with: Web site maintenance and specialized index construction. We also sketch several other possible applications.Using the library, we have also implemented a client-server architecture that allows us to perform interactive intelligent searches on the Web from an applet running on a browser.  相似文献   

4.
Path queries have been extensively used to query semistructured data, such as the Web and XML documents. In this paper we introduce weighted path queries, an extension of path queries enabling several classes of optimization problems (such as the computation of shortest paths) to be easily expressed. Weighted path queries are based on the notion of weighted regular expression, i.e., a regular expression whose symbols are associated to a weight. We characterize the problem of answering weighted path queries and provide an algorithm for computing their answer. We also show how weighted path queries can be effectively embedded into query languages for XML data to express in a simple and compact form several meaningful research problems.  相似文献   

5.
Search engines are increasingly efficient at identifying the best sources for any given keyword query, and are often able to identify the answer within the sources. Unfortunately, many web sources are not trustworthy, because of erroneous, misleading, biased, or outdated information. In many cases, users are not satisfied with the results from any single source. In this paper, we propose a framework to aggregate query results from different sources in order to save users the hassle of individually checking query-related web sites to corroborate answers. To return the best answers to the users, we assign a score to each individual answer by taking into account the number, relevance and originality of the sources reporting the answer, as well as the prominence of the answer within the sources, and aggregate the scores of similar answers. We conducted extensive qualitative and quantitative experiments of our corroboration techniques on queries extracted from the TREC Question Answering track and from a log of real web search engine queries. Our results show that taking into account the quality of web pages and answers extracted from the pages in a corroborative way results in the identification of a correct answer for a majority of queries.  相似文献   

6.
This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP #P -complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries.  相似文献   

7.
This paper discusses the issues involved in designing a query language for the Semantic Web and presents the OWL query language (OWL-QL) as a candidate standard language and protocol for query–answering dialogues among Semantic Web computational agents using knowledge represented in the W3Cs ontology web language (OWL). OWL-QL is a formal language and precisely specifies the semantic relationships among a query, a query answer, and the knowledge base(s) used to produce the answer. Unlike standard database and Web query languages, OWL-QL supports query–answering dialogues in which the answering agent may use automated reasoning methods to derive answers to queries, as well as dialogues in which the knowledge to be used in answering a query may be in multiple knowledge bases on the Semantic Web, and/or where those knowledge bases are not specified by the querying agent. In this setting, the set of answers to a query may be of unpredictable size and may require an unpredictable amount of time to compute.  相似文献   

8.
An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way.  相似文献   

9.
Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognitive science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. Three types of top-k typicality queries are formulated. To answer questions like “Who are the top-k most typical NBA players?”, the measure of simple typicality is developed. To answer questions like “Who are the top-k most typical guards distinguishing guards from other players?”, the notion of discriminative typicality is proposed. Moreover, to answer questions like “Who are the best k typical guards in whole representing different types of guards?”, the notion of representative typicality is used. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations: (1) the randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers; (2) the direct local typicality approximation using VP-trees provides an approximation quality guarantee; (3) a local typicality tree data structure can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree. An extensive performance study using two real data sets and a series of synthetic data sets clearly shows that top-k typicality queries are meaningful and our methods are practical.  相似文献   

10.
Efficient fuzzy ranking queries in uncertain databases   总被引:1,自引:1,他引:0  
Recently, uncertain data have received dramatic attention along with technical advances on geographical tracking, sensor network and RFID etc. Also, ranking queries over uncertain data has become a research focus of uncertain data management. With dramatically growing applications of fuzzy set theory, lots of queries involving fuzzy conditions appear nowadays. These fuzzy conditions are widely applied for querying over uncertain data. For instance, in the weather monitoring system, weather data are inherent uncertainty due to some measurement errors. Weather data depicting heavy rain are desired, where ??heavy?? is ambiguous in the fuzzy query. However, fuzzy queries cannot ensure returning expected results from uncertain databases. In this paper, we study a novel kind of ranking queries, Fuzzy Ranking queries (FRanking queries) which extend the traditional notion of ranking queries. FRanking queries are able to handle fuzzy queries submitted by users and return k results which are the most likely to satisfy fuzzy queries in uncertain databases. Due to fuzzy query conditions, the ranks of tuples cannot be evaluated by existing ranking functions. We propose Fuzzy Ranking Function to calculate tuples?? ranks in uncertain databases for both attribute-level and tuple-level uncertainty models. Our ranking function take both the uncertainty and fuzzy semantics into account. FRanking queries are formally defined based on Fuzzy Ranking Function. In the processing of answering FRanking queries, we present a pruning method which safely prunes unnecessary tuples to reduce the search space. To further improve the efficiency, we design an efficient algorithm, namely Incremental Membership Algorithm (IMA) which efficiently answers FRanking queries by evaluating the ranks of incremental tuples under each threshold for the fuzzy set. We demonstrate the effectiveness and efficiency of our methods through the theoretical analysis and experiments with synthetic and real datasets.  相似文献   

11.
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000  相似文献   

12.
The visual object query language (VOQL) recently proposed for object databases has been successful in visualizing path expressions and set-related conditions, and providing formal semantics. However, VOQL has several problems. Due to unrealistic assumptions, only set-related conditions can be represented in VOQL. Due to lack of the explicit language construct for the notion of variables, queries are often awkward and less intuitive.In this paper, we propose VOQL*, which extends VOQL to remove these drawbacks. We introduce the notion of visual variables and refine the syntax and semantics of VOQL based on visual variables. We carefully design the language constructs of VOQL*to reflect the syntax of OOPC, so that the constructs such as visual variables, visual elements, simple terms, structured terms,basic formulas , formulas, and query expressions in VOQL*are hierarchically and inductively constructed as those of OOPC. Most important, we formally define the semantics of each language construct of VOQL*by induction using OOPC. Because of the well-defined syntax and semantics, queries in VOQL*are clear, concise, and intuitive. We also provide an effective procedure to translate queries in VOQL*into those in OOPC. We believe that VOQL*is the first visual query language with the well-defined syntax reflecting the syntactic structure of logic and semantics formally defined by induction.  相似文献   

13.
Although there has been much work in recent years on answering queries using views, there has been less work on deriving answers from partial databases. That is given a partial database state D V , materialized via the view V, what queries can be asked over D V that can be answered with certainty using only the instance of the partial database and standard query evaluation mechanisms. We define these as the derivable answers and show several special cases in which we can compute and intensionally describe them.  相似文献   

14.
In this paper, we develop a new method to measure the quality of each tuple as an answer with respect to Select‐Project‐Join (SPJ) queries so that we can determine which answers are better answers to the given query in a fuzzy relational database. The quality of an answer is viewed as how much sure information is provided, and how much extra information is needed so that it will be a sure answer to the query. The less extra information that is required and the more sure information that is provided by an answer, the higher the quality of that answer is, and in consequence, it will be more reliable. © 2001 John Wiley & Sons, Inc.  相似文献   

15.
Users of information systems would like to express flexible queries over the data possibly retrieving imperfect items when the perfect ones, which exactly match the selection conditions, are not available. Most commercial DBMSs are still based on the SQL for querying. Therefore, providing some flexibility to SQL can help users to improve their interaction with the systems without requiring them to learn a completely novel language. Based on the fuzzy set theory and the α-cut operation of fuzzy number, this paper presents the generic fuzzy queries against classical relational databases and develops the translation of the fuzzy queries. The generic fuzzy queries mean that the query condition consists of complex fuzzy terms as the operands and complex fuzzy relations as the operators in a fuzzy query. With different thresholds that the user chooses for the fuzzy query, the user’s fuzzy queries can be translated into precise queries for classical relational databases.  相似文献   

16.
Answering queries using views is the problem which examines how to derive the answers to a query when we only have the answers to a set of views. Constructing rewritings is a widely studied technique to derive those answers. In this paper we consider the problem of the existence of rewritings in the case where the answers to the views uniquely determine the answers to the query. Specifically, we say that a view set Vdetermines a query Q if for any two databases D1,D2 it holds: V(D1)=V(D2) implies Q(D1)=Q(D2). We consider the case where query and views are defined by conjunctive queries and investigate the question: If a view set V determines a query Q, is there an equivalent rewriting of Q using V? We present here interesting cases where there are such rewritings in the language of conjunctive queries. Interestingly, we identify a class of conjunctive queries, CQpath, for which a view set can produce equivalent rewritings for “almost all” queries which are determined by this view set. We introduce a problem which relates determinacy to query equivalence. We show that there are cases where restricted results can carry over to broader classes of queries.  相似文献   

17.
This article addresses the problem of performing Nearest Neighbor (NN) queries on uncertain trajectories. The answer to an NN query for certain trajectories is time parameterized due to the continuous nature of the motion. As a consequence of uncertainty, there may be several objects that have a non-zero probability of being a nearest neighbor to a given querying object, and the continuous nature further complicates the semantics of the answer. We capture the impact that the uncertainty of the trajectories has on the semantics of the answer to continuous NN queries and we propose a tree structure for representing the answers, along with efficient algorithms to compute them. We also address the issue of performing NN queries when the motion of the objects is restricted to road networks. Finally, we formally define and show how to efficiently execute several variants of continuous NN queries. Our experiments demonstrate that the proposed algorithms yield significant performance improvements when compared with the corresponding naïve approaches.  相似文献   

18.
Database applications tend toward getting more versatile and broader to comply with the expansion of various organizations. However, naïve users usually suffer from accessing data arbitrarily by using formal query languages. Therefore, we believe that accessing databases using natural language constructs will become a popular interface in the future. The concept of object-oriented modeling makes the real world to be well represented or expressed in some kinds of logical form. Since the class diagram in UML is used to model the static relationships of databases, in this paper, we intend to study how to extend the UML class diagram representations to capture natural language queries with fuzzy semantics. By referring to the conceptual schema throughout the class diagram representation, we propose a methodology to map natural language constructs into the corresponding class diagram and employ Structured Object Model (SOM) methodology to transform the natural language queries into SQL statements for query executions. Moreover, our approach can handle queries containing vague terms specified in fuzzy modifiers, like ‘good’ or ‘bad’. By our approach, users obtain not only the query answers but also the corresponding degree of vagueness, which can be regarded as the same way we are thinking.  相似文献   

19.
A common approach to improve the reliability of query results based on error-prone sensors is to introduce redundant sensors. However, using multiple sensors to generate the value for a data item can be expensive, especially in wireless environments where continuous queries are executed. Moreover, some sensors may not be working properly and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensor nodes to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensor nodes, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for several common aggregate queries. Our statistics-based sensor node selection algorithm is demonstrated in a number of simulation experiments, which shows that a small number of sensor nodes can provide accurate and robust query results.  相似文献   

20.
Querying imprecise data in moving object environments   总被引:15,自引:0,他引:15  
In moving object environments, it is infeasible for the database tracking the movement of objects to store the exact locations of objects at all times. Typically, the location of an object is known with certainty only at the time of the update. The uncertainty in its location increases until the next update. In this environment, it is possible for queries to produce incorrect results based upon old data. However, if the degree of uncertainty is controlled, then the error of the answers to queries can be reduced. More generally, query answers can be augmented with probabilistic estimates of the validity of the answer. We study the execution of probabilistic range and nearest-neighbor queries. The imprecision in answers to queries is an inherent property of these applications due to uncertainty in data, unlike the techniques for approximate nearest-neighbor processing that trade accuracy for performance. Algorithms for computing these queries are presented for a generic object movement model and detailed solutions are discussed for two common models of uncertainty in moving object databases. We study the performance of these queries through extensive simulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号