首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.  相似文献   

2.
Adaptive query processing generally involves a feedback loop comprising monitoring, assessment and response. So far, individual proposals have tended to group together an approach to monitoring, a means of assessment, and a form of response. However, there are many benefits in decoupling these three phases, and in constructing generic frameworks for each of them. To this end, this paper discusses monitoring of query plan execution as a topic in its own right, and advocates an approach based on self-monitoring algebraic operators. This approach is shown to be generic and independent of any specific adaptation mechanism, easily implementable and portable, sufficiently comprehensive, appropriate for heterogeneous distributed environments, and more importantly, capable of driving on-the-fly adaptations of query plan execution. An experimental evaluation of the overheads and of the quality of the results obtained by monitoring is also presented.  相似文献   

3.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

4.
Continuous queries applied over nonterminating data streams usually specify windows in order to obtain an evolving–yet restricted–set of tuples and thus provide timely and incremental results. Although sliding windows get frequently employed in many user requests, additional types like partitioned or landmark windows are also available in stream processing engines. In this paper, we set out to study the existence of monotonic-related semantics for a rich set of windowing constructs in order to facilitate a more efficient maintenance of their changing contents. After laying out a formal foundation for expressing windowed queries, we investigate update patterns observed in most common window variants as well as their impact on adaptations of typical operators (like windowed join, union or aggregation), thus offering more insight towards design and implementation of stream processing mechanisms. Furthermore, we identify syntactic equivalences in algebraic expressions involving windows, to the potential benefit of query optimizations. Finally, this framework is validated for several windowed operations against streaming datasets with simulations at diverse arrival rates and window specifications, providing concrete evidence of its significance.  相似文献   

5.
The continuous broadcast of data together with an index structure is an effective way of disseminating data in a wireless mobile environment. The index allows a mobile client to tune in only when relevant data is available on the channel and leads to reduced power consumption for the clients. This paper investigates the execution of queries on broadcasted index trees when query execution corresponds to a partial traversal of the tree. Queries exhibiting this behavior include range queries and nearest neighbor queries. We present two broadcast schedules for index trees and two query algorithms executed by mobile clients. Our solutions simultaneously minimize tuning time and latency and adapt to the client’s available memory. Experimental results using real and synthetic data compare results for a broadcast with node repetition to one without node repetition and they show how a priority-based data management can help reduce tuning time and latency.  相似文献   

6.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications.  相似文献   

7.
Many continual range queries can be issued against data streams. To efficiently evaluate continual queries against a stream, a main memory-based query index with a small storage cost and a fast search time is needed, especially if the stream is rapid. In this paper, we study a CEI-based query index that meets both criteria for efficient processing of continual interval queries. This new query index is an indirect indexing approach. It centres around a set of predefined virtual containment-encoded intervals, or CEIs. The CEIs are used to first decompose query intervals and then perform efficient search operations. The CEIs are defined and labeled such that containment relationships among them are encoded in their IDs. The containment encoding makes decomposition and search operations efficient; from the encoding of the smallest CEI containing a data point, the encodings of other containing CEIs can be easily derived. Closed-form formulae for the bounds of the average index storage cost are derived. Simulations are conducted to evaluate the effectiveness of the CEI-based query index and to compare it with alternative approaches. The results show that the CEI-based query index significantly outperforms existing approaches in terms of both storage cost and search time. Kun-Lung Wu received the B.S. degree in electrical engineering from the National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in computer science from the University of Illinois at Urbana–Champaign. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. His current research interests include data streams, continual queries, mobile computing, Internet technologies and applications, database systems and distributed and parallel computing. He has published extensively and holds various patents in these areas. Dr. Wu is a Senior Member of the IEEE Computer Society and a member of the ACM. He was an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering, 2000–2004. He was the general chair for the 3rd International Workshop on e-Commerce and Web-Based Information Systems (WECWIS 2001). He has served as an organising and program committee member on various conferences. He has received various IBM awards, including IBM Corporate Environmental Affair Excellence Award, Research Division Award and Invention Achievement Awards. He received a best paper award from IEEE EEE 2004. He is an IBM Master Inventor. Shyh-Kwei Chen received the B.S. degree in computer science and information engineering from National Taiwan University, Taipei, Taiwan, in 1983, the M.S. degree in computer science from the University of Minnesota, Minneapolis, in 1987, and the Ph.D. degree in computer science from University of Illinois at Urbana–Champaign, in 1994. Dr. Chen has been with the IBM Thomas J. Watson Research Center, Yorktown Heights, New York since October 1994, where he is currently a research staff member. His current research interests include XML, electronic commerce, business performance management, data engineering and compilers. He is a member of the ACM, the IEEE and the IEEE Computer Society. Philip S. Yu received the B.S. degree in electrical engineering from National Taiwan University, the M.S. and Ph.D. degrees in electrical engineering from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and is currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing and performance modelling. Dr. Yu has published more than 400 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents. Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is an associate editor of ACM Transactions on Internet Technology. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor and advisory board member of IEEE Transactions on Knowledge and Data Engineering and also a guest coeditor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he was the program cochair of the 11th International Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, and the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, and the program chair of the 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases and the 2nd International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair of the 14th International Conference on Data Engineering and the general cochair of the 2nd IEEE International Conference on Data Mining. He has received several IBM honours, including two IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, two Research Division Awards and the 81st Plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor and was recognised as one of the IBM's 10 top leading inventors in 1999.  相似文献   

8.
This paper describes the design and implementation of a high-level query language called Generalized Query-By-Rule (GQBR) which supports retrieval, insertion, deletion and update operations. This language, based on the formalism of database logic, enables the users to access each database in a distributed heterogeneous environment, without having to learn all the different data manipulation languages. The compiler has been implemented on a DEC 1090 system in Pascal.  相似文献   

9.
Dataflow query execution in a parallel main-memory environment   总被引:2,自引:0,他引:2  
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries.Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule; larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.  相似文献   

10.
As a result of the extensive research in view-based query processing, three notions have been identified as fundamental, namely rewriting, answering, and losslessness. Answering amounts to computing the tuples satisfying the query in all databases consistent with the views. Rewriting consists in first reformulating the query in terms of the views and then evaluating the rewriting over the view extensions. Losslessness holds if we can answer the query by solely relying on the content of the views. While the mutual relationship between these three notions is easy to identify in the case of conjunctive queries, the terrain of notions gets considerably more complicated going beyond such a query class. In this paper, we revisit the notions of answering, rewriting, and losslessness and clarify their relationship in the setting of semistructured databases, and in particular for the basic query class in this setting, i.e., two-way regular path queries. Our first result is a clean explanation of the relationship between answering and rewriting, in which we characterize rewriting as a “linear approximation” of query answering. We show that applying this linear approximation to the constraint-satisfaction framework yields an elegant automata-theoretic approach to query rewriting. As for losslessness, we show that there are indeed two distinct interpretations for this notion, namely with respect to answering, and with respect to rewriting. We also show that the constraint-theoretic approach and the automata-theoretic approach can be combined to give algorithmic characterization of the various facets of losslessness. Finally, we deal with the problem of coping with loss, by considering mechanisms aimed at explaining lossiness to the user.  相似文献   

11.
The paper presents specifications and implementation details of a query language designed for retrieving information from a soil data bank. The commands of the language are based on operations of relational algebra, and can be employed without previous programming experience. The language is part of the ARSIS (A Relational Soil Information System) system that is being developed in Greece.  相似文献   

12.
Answering queries using views is the problem which examines how to derive the answers to a query when we only have the answers to a set of views. Constructing rewritings is a widely studied technique to derive those answers. In this paper we consider the problem of the existence of rewritings in the case where the answers to the views uniquely determine the answers to the query. Specifically, we say that a view set Vdetermines a query Q if for any two databases D1,D2 it holds: V(D1)=V(D2) implies Q(D1)=Q(D2). We consider the case where query and views are defined by conjunctive queries and investigate the question: If a view set V determines a query Q, is there an equivalent rewriting of Q using V? We present here interesting cases where there are such rewritings in the language of conjunctive queries. Interestingly, we identify a class of conjunctive queries, CQpath, for which a view set can produce equivalent rewritings for “almost all” queries which are determined by this view set. We introduce a problem which relates determinacy to query equivalence. We show that there are cases where restricted results can carry over to broader classes of queries.  相似文献   

13.
Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method.  相似文献   

14.
15.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments.  相似文献   

16.
FP is the programming language defined by J. Backus to demonstrate the virtues of functional programming as opposed to conventional programming in Von Neumann-like languages.In this paper we investigate the use of FP in the framework of relational data bases. In particular, we show how the language can be used to define base relations, to derive views from a collection of relations, and to express complex database queries.The language provides all capabilities of pure algebraic relational languages, but is considerably more powerful. As such, it can be used as a formal specification language to describe the semantics of queries expressed in relational languages, such as Query-By-Example. In addition the algebra of FP programs allows one to formally prove properties of such queries.  相似文献   

17.
In most object-oriented databases, as opposed to value-oriented databases, data are accessed by a programming language instead of a declarative query language. End users do not have a simple tool to perform their queries. This paper is concerned with the definition of an SQL-like query language on top of the O2 object-oriented database system. We study the influence of the object-oriented paradigm on a query language, describe our language through examples, define its semantics by means of an algebra and sketch the compilation of RELOOP in one of the languages supported by the O2 system.  相似文献   

18.
The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces (NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this paper focuses on usability and investigates if NLIs and natural language query languages are useful from an end-user's point of view. To that end, we introduce four interfaces each allowing a different query language and present a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality.  相似文献   

19.
Efficient processing of distance-based queries (DBQs) is of great importance in spatial databases due to the wide area of applications that may address such queries. The most representative and known DBQs are the K Nearest Neighbors Query (KNNQ), ρ Distance Range Query (ρDRQ), K Closest Pairs Query (KCPQ) and ρ Distance Join Query (ρDJQ). In this paper, we propose new pruning mechanism to apply them in the design of new Recursive Best-First Search (RBFS) algorithms for DBQs between spatial objects indexed in R-trees. RBFS is a general search algorithm that runs in linear space and expands nodes in best-first order, but it can suffer from node re-expansion overhead (i.e. to expand nodes in best-first order, some nodes can be considered more than once). The R-tree and its variations are commonly cited spatial access methods that can be used for answering such spatial queries. Moreover, an exhaustive experimental study was also included using R-trees, which resulted to several conclusions about the efficiency of proposed RBFS algorithm and its comparison with respect to other search algorithms (Best-First Search (BFS) and Depth-First Branch-and-Bound (DFBnB)), in terms of disk accesses, response time and main memory requirements, taking into account several important parameters as maximum branching factor (Cmax), cardinality of the final query result (K), distance threshold (ρ) and size of a global LRU buffer (B). In general RBFS is competitive for KNNQ and KCPQ where the maximum branching factor (Cmax) is large enough (even better than DFBnB and very close to BFS), and it is a good alternative when we have main memory limitations in our computer due to high process overload in our system, since it is linear space consuming with respect to the height of the R-trees. Nevertheless, RBFS is the worst alternative for ρDRQ and ρDJQ. DFBnB is also a linear space algorithm and it obtains the same behavior as BFS for ρDRQ and ρDJQ; and it is the best when an LRU buffer was included. Finally, we have been able to check experimentally that BFS is the best for all DBQs, but it can consume many main memory resources to perform spatial queries.  相似文献   

20.
This paper discusses the issues involved in designing a query language for the Semantic Web and presents the OWL query language (OWL-QL) as a candidate standard language and protocol for query–answering dialogues among Semantic Web computational agents using knowledge represented in the W3Cs ontology web language (OWL). OWL-QL is a formal language and precisely specifies the semantic relationships among a query, a query answer, and the knowledge base(s) used to produce the answer. Unlike standard database and Web query languages, OWL-QL supports query–answering dialogues in which the answering agent may use automated reasoning methods to derive answers to queries, as well as dialogues in which the knowledge to be used in answering a query may be in multiple knowledge bases on the Semantic Web, and/or where those knowledge bases are not specified by the querying agent. In this setting, the set of answers to a query may be of unpredictable size and may require an unpredictable amount of time to compute.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号