首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Active data warehouses: complementing OLAP with analysis rules   总被引:2,自引:0,他引:2  
Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.  相似文献   

2.
3.
On-line analytical processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large, it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size (Chauduri and Dayal, 1997). Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. DataIndexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches.  相似文献   

4.
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.  相似文献   

5.
The development of data warehouses begins with the definition of multidimensional models at the conceptual level in order to structure data, which will facilitate decision makers with an easier data analysis. Current proposals for conceptual multidimensional modelling focus on the design of static data warehouse structures, but few approaches model the queries which the data warehouse should support by means of OLAP (on-line analytical processing) tools. OLAP queries are, therefore, only defined once the rest of the data warehouse has been implemented, which prevents designers from verifying from the very beginning of the development whether the decision maker will be able to obtain the required information from the data warehouse. This article presents a solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators. These operators can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model. Furthermore, OLAP tools require the implementation of queries to assure performance optimisations based on pre-aggregation. It is interesting to note that the OLAP queries defined by our approach can be automatically implemented in the rest of the data warehouse, in a coherent and integrated manner. This implementation is supported by a code-generation architecture aligned with model-driven technologies, in particular the MDA (model-driven architecture) proposal. Finally, our proposal has been validated by means of a set of sample data sets from a well-known case study.  相似文献   

6.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.  相似文献   

7.
Although many real world phenomena are vague and characterized by having uncertain location or vague shape, existing spatial data warehouse models do not support spatial vagueness and then cannot properly represent these phenomena. In this paper, we propose the VSCube conceptual model to represent and manipulate shape vagueness in spatial data warehouses, allowing the analysis of business scores related to vague spatial data, and therefore improving the decision-making process. Our VSCube conceptual model is based on the cube metaphor and supports geometric shapes and the corresponding membership values, thus providing more expressiveness to represent vague spatial data. We also define vague spatial aggregation functions (e.g. vague spatial union) and vague spatial predicates to enable vague SOLAP queries (e.g. intersection range queries). Finally, we introduce the concept of vague SOLAP and its operations (e.g. drill-down and roll-up). We demonstrate the applicability of our model by describing an application concerning pest control in agriculture and by discussing the reuse of existing models in the VSCube conceptual model.  相似文献   

8.
Data warehouses (DW) form the backbone of data integration that is necessary for analytical applications, and play important roles in the information technology landscape of many industries. We introduce an approach for addressing the fundamental problem of semantic heterogeneity in the design of data integration requirements during DW development. In contrast to ontology-driven or schema-matching approaches, which propose the automatic resolution of differences ex-post, our approach addresses the core problem of data integration requirements: understanding and resolving different contextual meanings of data fields. We ground the approach firmly in communication theory and build on practices from agile software development. Besides providing relevant insights for the design of data integration requirements, our findings point to communication theory as a sound underlying foundation for a design theory of information systems development.  相似文献   

9.
10.
OLAP over uncertain and imprecise data   总被引:2,自引:0,他引:2  
We extend the OLAP data model to represent data ambiguity, specifically imprecision and uncertainty, and introduce an allocation-based approach to the semantics of aggregation queries over such data. We identify three natural query properties and use them to shed light on alternative query semantics. While there is much work on representing and querying ambiguous data, to our knowledge this is the first paper to handle both imprecision and uncertainty in an OLAP setting.  相似文献   

11.
Histogram feature representation is important in many classification applications for characterization of the statistical distribution of different pattern attributes, such as the color and edge orientation distribution in images. While the construction of these feature representations is simple, this very simplicity may compromise the classification accuracy in those cases where the original histogram does not provide adequate discriminative information for making a reliable classification. In view of this, we propose an optimization approach based on evolutionary computation (Back, Evolutionary algorithms in theory and practice, Oxford University Press, New York, 1996; Fogel, Evolutionary computation: toward a new philosophy of machine intelligence, 2nd edn. IEEE, Piscataway, NJ 1998) to identify a suitable transformation on the histogram feature representation, such that the resulting classification performance based on these features is maximally improved while the original simplicity of the representation is retained. To facilitate this optimization process, we propose a hierarchical classifier structure to demarcate the set of categories in such a way that the pair of category subsets with the highest level of dissimilarities is identified at each stage for partition. In this way, the evolutionary search process for the required transformation can be considerably simplified due to the reduced level of complexities in classification for two widely separated category subsets. The proposed approach is applied to two problems in multimedia data classification, namely the categorization of 3D computer graphics models and image classification in the JPEG compressed domain. Experimental results indicate that the evolutionary optimization approach, facilitated by the hierarchical classification process, is capable of significantly improving the classification performance for both applications based on the transformed histogram representations.
Hau-San WongEmail:
  相似文献   

12.
针对混合数据源的采集、集成和应用问题,提出一种面向混合数据源的数据库私有云设计,针对混合数据源的不同结构化程度,将数据源分成结构化、半结构化和非结构化;针对混合数据源不同的不同时序特点,数据库私有云设计定时采集和实时采集两种采集方式;为了提高混合数据源的存取效率,数据库私有云将非结构化和半结构化数据存储在分布式文件系统...  相似文献   

13.
We focus exclusively on the issue of Requirements engineering for Data Warehouses (DW). Our position is that the information content of a DW is found in the larger context of the goals of an organization. We refer to this context as the organizational perspective. Goals identify the set of decisions that are relevant which in turn help in determining the information needed to support these. The organizational perspective is converted into the technical perspective, which deals with the set of decisions to be supported and the information required. The latter defines Data warehouse contents. To elicit the technical perspective, we use the notion of an informational scenario. It is a typical interaction between a DW system and the decision maker and consists of a sequence of pairs of the form, <information request, response>. We formulate an information request as a statement in an adapted form of SQL called Specification SQL. The proposals here are implemented in the form of an Informational Scenario Engine that processes informational scenarios and determines Data Warehouse Information Contents.  相似文献   

14.
In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice. We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view. We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies. A preliminary version this paper appeared, under the title “Integrating Heterogeneous Multidimensional Databases” [9], in 17th Int. Conference on Scientific and Statistical Database Management, 2005.  相似文献   

15.
This paper proposes a novel, probabilistic data model and algebra that improves the modeling and querying of uncertain data in spatial OLAP (SOLAP) to support location-based services. Data warehouses that support location-based services need to combine complex hierarchies, such as road networks or transportation infrastructures, with static and dynamic content, e.g., speed limits and vehicle positions, respectively. Both the hierarchies and the content are often uncertain in real-world applications. Our model supports the use of probability distributions within both facts and dimensions. We give an algebra that correctly aggregates uncertain data over uncertain hierarchies. This paper also describes an implementation of the model and algebra, gives a complexity analysis of the algebra, and reports on an empirical, experimental evaluation of the implementation. The work is motivated with a real-world case study, based on our collaboration with a leading Danish vendor of location-based services.  相似文献   

16.
Today, cluster-based computing is the mainstream architecture for high end computer systems. Balanced system design is critical for large scale cluster systems to achieve high efficiency. This paper addresses the practice on DeepComp high end computer systems toward a balanced system design. Methodologies of designing balanced large scale cluster systems are given. A method for balancing central processing unit (CPU) and memory hierarchy is addressed. For balancing computing nodes and I/O systems, two approaches are given: maximum bandwidth criterion and maximum number of computing nodes which can concurrently access I/O systems. Experiences of Lenovo high end cluster systems show that above methods are effective. Lenovo strategies toward a balanced system design for both peta and 10 peta scale high productivity computing systems (HPCSs).  相似文献   

17.
王红卫 《软件》2012,33(10):55-56,63
本文以塔西南公司人力资源管理系统HRMIS为背景,开发出一套能将原有数据顺利导出,并对原有系统数据视图进行必要的扩充,使整理后的基础数据可靠的迁移至社保上报数据库中.  相似文献   

18.
Using OLAP and multidimensional data for decision making   总被引:1,自引:0,他引:1  
Hasan  H. Hyland  P. 《IT Professional》2001,3(5):44-50
Managers see information as a critical resource and require systems that let them exploit it for competitive advantage. One way to better use organizational information is via online analytical processing and multidimensional databases (MDDBs). OLAP and MDDBs present summarized information from company databases. They use multidimensional structures that let managers slice and dice views of company performance data and drill down into trouble spots. For over a decade, proponents have touted these tools as the ultimate executive information system, but most of the hype comes from product vendors themselves. Based on our experience with several OLAP tools, we have developed a more pragmatic approach to the design of multidimensional information systems that lets managers make the most of their companies' information assets  相似文献   

19.
20.
Materialized views and indexes are physical structures for accelerating data access that are casually used in data warehouses. However, these data structures generate some maintenance overhead. They also share the same storage space. Most existing studies about materialized view and index selection consider these structures separately. In this paper, we adopt the opposite stance and couple materialized view and index selection to take view–index interactions into account and achieve efficient storage space sharing. Candidate materialized views and indexes are selected through a data mining process. We also exploit cost models that evaluate the respective benefit of indexing and view materialization, and help select a relevant configuration of indexes and materialized views among the candidates. Experimental results show that our strategy performs better than an independent selection of materialized views and indexes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号