期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Structural consistency: enabling XML keyword search to eliminate spurious results consistently

Ki-Hoon Lee Kyu-Young Whang Wook-Shin Han Min-Soo Kim 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(4):503-529

XML keyword search is a user-friendly way to query XML data using only keywords. In XML keyword search, to achieve high precision without sacrificing recall, it is important to remove spurious results not intended by the user. Efforts to eliminate spurious results have enjoyed some success using the concepts of LCA or its variants, SLCA and MLCA. However, existing methods still could find many spurious results. The fundamental cause for the occurrence of spurious results is that the existing methods try to eliminate spurious results locally without global examination of all the query results and, accordingly, some spurious results are not consistently eliminated. In this paper, we propose a novel keyword search method that removes spurious results consistently by exploiting the new concept of structural consistency. We define structural consistency as a property that is preserved if there is no query result having an ancestor-descendant relationship at the schema level with any other query results. A naive solution to obtain structural consistency would be to compute all the LCAs (or variants) and then to remove spurious results according to structural consistency. Obviously, this approach would always be slower than existing LCA-based ones. To speed up structural consistency checking, we must be able to examine the query results at the schema level without generating all the LCAs. However, this is a challenging problem since the schema-level query results do not homomorphically map to the instance-level query results, causing serious false dismissal. We present a comprehensive and practical solution to this problem and formally prove that this solution preserves structural consistency at the schema level without incurring false dismissal. We also propose a relevance-feedback-based solution for the problem where our method has low recall, which occurs when it is not the user’s intention to find more specific results. This solution has been prototyped in a full-fledged object-relational DBMS Odysseus developed at KAIST. Experimental results using real and synthetic data sets show that, compared with the state-of-the-art methods, our solution significantly (1) improves precision while providing comparable recall for most queries and (2) enhances the query performance by removing spurious results early. 相似文献

2.

Reconcilable Differences

Todd J. Green Zachary G. Ives Val Tannen 《Theory of Computing Systems》2011,49(2):460-488

In this paper we study a problem motivated by the management of changes in databases. It turns out that several such change scenarios, e.g., the separately studied problems of view maintenance (propagation of data changes) and view adaptation (propagation of view definition changes) can be unified as instances of query reformulation using views provided that support for the relational difference operator exists in the context of query reformulation. Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Unfortunately, most questions about queries become undecidable in the presence of difference (or negation), whether we use the foundational set semantics or the more practical bag semantics. 相似文献

3.

The Quantum Setting with Randomized Queries for Continuous Problems

H. Woźniakowski 《Quantum Information Processing》2006,5(2):83-130

The standard setting of quantum computation for continuous problems uses deterministic queries and the only source of randomness for quantum algorithms is through measurement. Without loss of generality we may consider quantum algorithms which use only one measurement. This setting is related to the worst case setting on a classical computer in the sense that the number of qubits needed to solve a continuous problem must be at least equal to the logarithm of the worst case information complexity of this problem. Since the number of qubits must be finite, we cannot solve continuous problems on a quantum computer with infinite worst case information complexity. This can even happen for continuous problems with small randomized complexity on a classical computer. A simple example is integration of bounded continuous functions. To overcome this bad property that limits the power of quantum computation for continuous problems, we study the quantum setting in which randomized queries are allowed. This type of query is used in Shor’s algorithm. The quantum setting with randomized queries is related to the randomized classical setting in the sense that the number of qubits needed to solve a continuous problem must be at least equal to the logarithm of the randomized information complexity of this problem. Hence, there is also a limit to the power of the quantum setting with randomized queries since we cannot solve continuous problems with infinite randomized information complexity. An example is approximation of bounded continuous functions. We study the quantum setting with randomized queries for a number of problems in terms of the query and qubit complexities defined as the minimal number of queries/qubits needed to solve the problem to within ɛ by a quantum algorithm. We prove that for path integration we have an exponential improvement for the qubit complexity over the quantum setting with deterministic queries. 相似文献

4.

How expressive is stratified aggregation?

Inderpal Singh Mumick Oded Shmueli 《Annals of Mathematics and Artificial Intelligence》1995,15(3-4):407-435

We present results on the expressive power of various deductive database languages extended with stratified aggregation. We show that (1) Datalog extended with stratified aggregation cannot express a query to count the number of paths between every pair of nodes in an acyclic graph, (2) Datalog extended with stratified aggregation and arithmetic on integers (the + operator) can express allcomputable queries on ordered domains, and (3) Datalog extended with stratified aggregation and generic function symbols can express allcomputable queries (on ordered or unordered domains). Note that without stratified aggregation, the above extensions of Datalog cannot express all computable queries. We show that replacing stratified aggregation by stratified negation preserves expressiveness. We identify subclasses of the above languages that are complete (can express all, and only the, computable queries). 相似文献

5.

{\cal SMDP}: minimizing buffer requirements for continuous media servers

Youjip Won Jaideep Srivastava 《Multimedia Systems》2000,8(2):105-117

Excessive buffer requirement to handle continuous-media playbacks is an impediment to cost- effective provisioning for on-line video retrieval. Given the skewed distribution of video popularity, it is expected that often there are concurrent playbacks of the same video file within a short time interval. This creates an opportunity to batch multiple requests and to service them with a single stream from the disk without violating the on-demand constraint. However, there is a need to keep data in memory between successive uses to do this. This leads to a buffer space trade-off between servicing a request in memory mode vs. servicing it in disk-mode. In this work, we develop a novel algorithm to minimize the buffer requirement to support a set of concurrent playbacks. One of the beauties of the proposed scheme is that it enables the server to dynamically adapt to the changing workload while minimizing the total buffer space requirement. Our algorithm makes a significant contribution in decreasing the total buffer requirement, especially when the user access pattern is biased in favor of a small set of files. The idea of the proposed scheme is modeled in detail using an analytical formulation, and optimality of the algorithm is proved. An analytical framework is developed so that the proposed scheme can be used in combination with various existing disk-scheduling strategies. Our simulation results confirm that under certain circumstances, it is much more resource efficient to support some of the playbacks in memory mode and subsequently the proposed scheme enables the server to minimize the overall buffer space requirement. 相似文献

6.

Bounded Queries, Approximations, and the Boolean Hierarchy

Richard Chang 《Information and Computation》2001,169(2):129

This paper investigates nondeterministic bounded query classes in relation to the complexity of NP-hard approximation problems and the Boolean Hierarchy. Nondeterministic bounded query classes turn out be rather suitable for describing the complexity of NP-hard approximation problems. The results in this paper take advantage of this machine-based model to prove that in many cases, NP-approximation problems have the upward collapse property. That is, a reduction between NP-approximation problems of apparently different complexity at a lower level results in a similar reduction at a higher level. For example, if M C reduces to (log n)-approximating M C using many–one reductions, then the Traveling Salesman Problem (TSP) is equivalent to M C under many–one reductions. Several upward collapse theorems are presented in this paper. The proofs of these theorems rely heavily on the machinery provided by the nondeterministic bounded query classes. In fact, these results depend on a surprising connection between the Boolean hierarchy and nondeterministic bounded query classes. 相似文献

7.

On querying simple conceptual graphs with negation

Marie-Laure Michel 《Data & Knowledge Engineering》2007,60(3):468-493

We consider basic conceptual graphs, namely simple conceptual graphs (SGs), which are equivalent to the existential conjunctive positive fragment of first-order logic. The fundamental problem, deduction, is performed by a graph homomorphism called projection. The existence of a projection from a SG Q to a SG G means that the knowledge represented by Q is deducible from the knowledge represented by G. In this framework, a knowledge base is composed of SGs representing facts and a query is itself a SG. We focus on the issue of querying SGs, which highlights another fundamental problem, namely query answering. Each projection from a query to a fact defines an answer to the query, with an answer being itself a SG. The query answering problem asks for all answers to a query.

This paper introduces atomic negation into this framework. Several understandings of negation are explored, which are all of interest in real world applications. In particular, we focus on situations where, in the context of incomplete knowledge, classical negation is not satisfactory because deduction can be proven but there is no answer to the query. We show that intuitionistic deduction captures the notion of an answer and can be solved by projection checking. Algorithms are provided for all studied problems. They are all based on projection. They can thus be combined to deal with several kinds of negation simultaneously. Relationships with problems on conjunctive queries in databases are recalled and extended. Finally, we point out that this discussion can be put in the context of semantic web databases. 相似文献

8.

The trichotomy of HAVING queries on a probabilistic database

Christopher Ré Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1091-1116

We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic databases where the aggregation function is one of MIN, MAX, EXISTS, COUNT, SUM, AVG, or COUNT(DISTINCT) and the comparison function is one of =, ≠,≥,>,≤, or <. The complexity of evaluating a HAVING query depends on the aggregation function, α, and the comparison function, θ. In this paper, we establish a set of trichotomy results for conjunctive queries with HAVING predicates parametrized by (α, θ). For such queries (without self-joins), one of the following three statements is true: (1) the exact evaluation problem has P{\mathcal P} -time data complexity. In this case, we call the query safe. (2) The exact evaluation problem is \sharpP{{\sharp{\mathcal P}}} -hard, but the approximate evaluation problem has (randomized) P{{\mathcal P}} -time data complexity. More precisely, there exists an FPTRAS for the query. In this case, we call the query apx-safe. (3) The exact evaluation problem is \sharpP{{\sharp{\mathcal P}}} -hard, and the approximate evaluation problem is also hard. We call these queries hazardous. The precise definition of each class depends on the aggregate considered and the comparison function. Thus, we have queries that are (MAX,≥ )-safe, (COUNT,≤ )-apx-safe, (SUM,=)-hazardous, etc. Our trichotomy result is a significant extension of a previous dichotomy result for Boolean conjunctive queries into safe and not safe. For each of the three classes we present novel techniques. For safe queries, we describe an evaluation algorithm that uses random variables over semirings. For apx-safe queries, we describe an FPTRAS that relies on a novel algorithm for generating a random possible world satisfying a given condition. Finally, for hazardous queries we give novel proofs of hardness of approximation. The results for safe queries were previously announced (in Ré, C., Suciu, D. Efficient evaluation of. In: DBPL, pp. 186–200, 2007), but all other results are new. 相似文献

9.

Optimizing the execution of XSLT stylesheets for querying transformed XML data

Sven Groppe Jinghua Groppe Stefan Böttcher Thomas Wycisk Le Gruenwald 《Knowledge and Information Systems》2009,18(3):331-391

We have to deal with different data formats whenever data formats evolve or data must be integrated from heterogeneous systems. These data when implemented in XML for data exchange cannot be shared freely among applications without data transformation. A common approach to solve this problem is to convert the entire XML data from their source format to the applications’ target formats using the transformations rules specified in XSLT stylesheets. However, in many cases, not all XML data are required to be transformed except for a smaller part described by a user’s query (application). In this paper, we present an approach that optimizes the execution time of an XSLT stylesheet for answering a given XPath query by modifying the XSLT stylesheet in such a way that it would (a) capture only the parts in the XML data that are relevant to the query and (b) process only those XSLT instructions that are relevant to the query. We prove the correctness of our optimization approach, analyze its complexity and present experimental results. The experimental results show that our approach performs the best in terms of execution time, especially when many cost-intensive XSLT instructions can be excluded in the XSLT stylesheet. 相似文献

10.

A new fast-path mechanism for mutual exclusion

James H. Anderson Yong-Jik Kim 《Distributed Computing》2001,14(1):17-29

Summary. Several years ago, Yang and Anderson presented an N-process algorithm for mutual exclusion under read/write atomicity that has time complexity, where “time” is measured by counting remote memory references. In this algorithm, instances of a two-process mutual exclusion algorithm are embedded within a binary arbitration tree. In the two-process algorithm that was used, all busy-waiting is done by “local spinning.” Performance studies presented by Yang and Anderson showed that their N-process algorithm exhibits scalable performance under heavy contention. One drawback of using an arbitration tree, however, is that each process is required to perform remote memory operations even when there is no contention. To remedy this problem, Yang and Anderson presented a variant of their algorithm that includes a “fast-path” mechanism that allows the arbitration tree to be bypassed in the absence of contention. This algorithm has the desirable property that contention-free time complexity is O(1). Unfortunately, the fast-path mechanism that was used caused time complexity under contention to rise to in the worst case. To this day, the problem of designing a read/write mutual exclusion algorithm with O(1) time complexity in the absence of contention and O(logN) time complexity under contention has remained open. In this paper, we close this problem by presenting a fast-path mechanism that achieves these time complexity bounds when used in conjunction with Yang and Anderson's arbitration-tree algorithm. Received: July 1999 / Accepted: July 2000 相似文献

11.

Incorporating statistical information into expert classification systems to reduce classification costs

Louis Anthony Cox Jr. 《Annals of Mathematics and Artificial Intelligence》1990,2(1-4):93-107

Interactive expert systems seek relevant information from a user in order to answer a query or to solve a problem that the user has posed. A fundamental design issue for such a system is therefore itsinformation-seeking strategy, which determines the order in which it asks questions or performs experiments to gain the information that it needs to respond to the user. This paper examines the problem of optimal knowledge acquisition through questioning in contexts where it is expensive or time-consuming to obtain the answers to questions. An abstract model of an expert classification system — considered as a set of logical classification rules supplemented by some statistical knowledge about attribute frequencies — is developed and applied to analyze the complexity and to present constructive algorithms for doing probabilistic question-based classification. New heuristics are presented that generalize previous results for optimal identification keys and questionnaires. For an important class of discrete discriminant analysis problems, these heuristics find optimal or near-optimal questioning strategies in a small fraction of the time required by an exact solution algorithm. 相似文献

12.

Functional queries in Datalog

Stefano Basta Sergio Flesca Sergio Greco 《New Generation Computing》2002,20(4):339-371

A ‘functional’ query is a query whose answer is always defined and unique i.e. it is either true or false in all models. It has been shown that the expressive powers of the various types of stable models, when restricted to the class of DATALOG^¬ functional queries, do not in practice go beyond those of well-founded semantics, except for the least undefined stable models which, instead, capture the whole boolean hierarchyBH. In this paper we present a ‘functional’ language which, by means of a disciplined use of negation, achieves the desired level of expressiveness up toBH. Although the semantics of the new language is partial, all atoms in the source program are defined and possibly undefined atoms are introduced in a rewriting phase to increase the expressive power. We show that the language satisfies ‘desirable’ properties better than classical languages with (unstratified) negation and stable model semantics. We present an algorithm for the evaluation of functional queries and we show that exponential time resolution is required for hard problems only. Finally we present the architecture of a prototype of the language which has been developed. 相似文献

13.

On Similarity Measures for Multimedia Database Applications 总被引：1，自引：1，他引：0

K. Selçuk Candan Wen-Syan Li 《Knowledge and Information Systems》2001,3(1):30-51

A multimedia database query consists of a set of fuzzy and boolean (or crisp) predicates, constants, variables, and conjunction, disjunction, and negation operators. The fuzzy predicates are evaluated based on different media criteria, such as color, shape, layout, keyword. Since media-based evaluation yields similarity values, results to such a query is defined as an ordered set. Since many multimedia applications require partial matches, query results also include tuples which do not satisfy all predicates. Hence, any fuzzy semantics which extends the boolean semantics of conjunction in a straight forward manner may not be desirable for multimedia databases. In this paper, we focus on the problem of ‘given a multimedia query which consists of multiple fuzzy and crisp predicates, how to provide the user with a meaningful overall ranking.’ More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the retrieval requirements in multimedia databases. Received 13 August 1999 / Revised 13 May 2000 / Accepted in revised form 26 July 2000 相似文献

14.

A graph theoretical approach to determine a join reducer sequencein distributed query processing

Ming-Syan Chen Yu P.S. 《Knowledge and Data Engineering, IEEE Transactions on》1994,6(1):152-165

Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms 相似文献

15.

Query containment under bag and bag-set semantics

Foto N. Afrati Manolis Gergatsoulis 《Information Processing Letters》2010,110(10):360-369

Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of duplicate tuples is required. One of the major problems that arises as part of advanced features of query optimization, data integration, query reformulation and many other research topics is testing for containment of such queries. In this work, we investigate the complexity of query containment problem for CQs under bag semantics (i.e. duplicate tuples are allowed in both the database and the results of queries) and under bag-set semantics (i.e. duplicates are allowed in the result of the queries but not in the database). We derive complexity results for these problems for five major subclasses of CQs; and we also find necessary conditions for CQ query containment. The general case of these problems remains open. 相似文献

16.

Weight-based consistent query answering over inconsistent {\mathcal {SHIQ}} knowledge bases

Jianfeng Du Guilin Qi Yi-Dong Shen 《Knowledge and Information Systems》2013,34(2):335-371

相似文献

17.

Complexity theoretic hardness results for query learning

H. Aizenstein T. Hegedüs L. Hellerstein L. Pitt 《Computational Complexity》1998,7(1):19-53

We investigate the complexity of learning for the well-studied model in which the learning algorithm may ask membership and equivalence queries. While complexity theoretic techniques have previously been used to prove hardness results in various learning models, these techniques typically are not strong enough to use when a learning algorithm may make membership queries. We develop a general technique for proving hardness results for learning with membership and equivalence queries (and for more general query models). We apply the technique to show that, assuming , no polynomial-time membership and (proper) equivalence query algorithms exist for exactly learning read-thrice DNF formulas, unions of halfspaces over the Boolean domain, or some other related classes. Our hardness results are representation dependent, and do not preclude the existence of representation independent algorithms.?The general technique introduces the representation problem for a class F of representations (e.g., formulas), which is naturally associated with the learning problem for F. This problem is related to the structural question of how to characterize functions representable by formulas in F, and is a generalization of standard complexity problems such as Satisfiability. While in general the representation problem is in , we present a theorem demonstrating that for "reasonable" classes F, the existence of a polynomial-time membership and equivalence query algorithm for exactly learning F implies that the representation problem for F is in fact in co-NP. The theorem is applied to prove hardness results such as the ones mentioned above, by showing that the representation problem for specific classes of formulas is NP-hard. Received: December 6, 1994 相似文献

18.

On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems

Sudipto Guha 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(6):1509-1535

Synopses construction algorithms have been found to be of interest in query optimization, approximate query answering and mining, and over the last few years several good synopsis construction algorithms have been proposed. These algorithms have mostly focused on the running time of the synopsis construction vis-a-vis the synopsis quality. However the space complexity of synopsis construction algorithms has not been investigated as thoroughly. Many of the optimum synopsis construction algorithms are expensive in space. For some of these algorithms the space required to construct the synopsis is significantly larger than the space required to store the input. These algorithms rely on the fact that they require a smaller “working space” and most of the data can be resident on disc. The large space complexity of synopsis construction algorithms is a handicap in several scenarios. In the case of streaming algorithms, space is a fundamental constraint. In case of offline optimal or approximate algorithms, a better space complexity often makes these algorithms much more attractive by allowing them to run in main memory and not use disc, or alternately allows us to scale to significantly larger problems without running out of space. In this paper, we propose a simple and general technique that reduces space complexity of synopsis construction algorithms. As a consequence we show that the notion of “working space” proposed in these contexts is redundant. This technique can be easily applied to many existing algorithms for synopsis construction problems. We demonstrate the performance benefits of our proposal through experiments on real-life and synthetic data. We believe that our algorithm also generalizes to a broader range of dynamic programs beyond synopsis construction. Sudipto Guha’s research supported in part by an Alfred P. Sloan Research Fellowship and by NSF Awards CCF-0430376, CCF-0644119.A preliminary version of the paper appeared as “Space efficiency in synopsis construction algorithms”, VLDB Conference 2005, Trondheim, [19]. 相似文献

19.

On the complexity of distributed query optimization

Chihping Wang Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(4):650-662

While a significant amount of research efforts has been reported on developing algorithms, based on joins and semijoins, to tackle distributed query processing, there is relatively little progress made toward exploring the complexity of the problems studied. As a result, proving NP-hardness of or devising polynomial-time algorithms for certain distributed query optimization problems has been elaborated upon by many researchers. However, due to its inherent difficulty, the complexity of the majority of problems on distributed query optimization remains unknown. In this paper we generally characterize the distributed query optimization problems and provide a frame work to explore their complexity. As it will be shown, most distributed query optimization problems can be transformed into an optimization problem comprising a set of binary decisions, termed Sum Product Optimization (SPO) problem. We first prove SPO is NP-hard in light of the NP-completeness of a well-known problem, Knapsack (KNAP). Then, using this result as a basis, we prove that five classes of distributed query optimization problems, which cover the majority of distributed query optimization problems previously studied in the literature, are NP-hard by polynomially reducing SPO to each of them. The detail for each problem transformation is derived. We not only prove the conjecture that many prior studies relied upon, but also provide a frame work for future related studies 相似文献

20.

A random generator of resource-constrained multi-project network problems

Tyson R. Browning Ali A. Yassine 《Journal of Scheduling》2010,13(2):143-161

Many scheduling problems in project management, manufacturing, and elsewhere require the generation of activity networks to test proposed solution methods. Single-network generators have been used for the resource-constrained project scheduling problem (RCPSP). Since the first single-network generator was proposed in 1993, several advances have been reported in the literature. However, these generators create only one network or project at a time; they cannot generate multi-project problems to desired specifications. This paper presents the first multi-network problem generator. It is especially useful for investigating the resource-constrained multi-project scheduling problem (RCMPSP), where a controlled set of multi-project test problems is crucial for analyzing the performance of solution methods. In addition to the single-project characteristics handled by existing network generators—such as activity duration, resource types and usage, and network size, shape, and complexity—the proposed generator produces multi-project portfolios with controlled resource distributions and amounts of resource contention. To enable the generation of projects with desired levels of network complexity, we also develop several theoretical insights on the effects of network topology on the probability of successful network generation. Finally, we generate 12,320 test problems for a full-factorial experiment and use analysis of means to conclude that the generator produces “near-strongly random” problems. Fully “strongly random” problems require much greater computational expense. 相似文献