首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
赵群 《福建电脑》2006,(4):51-52
脏数据是整个数据仓库的隐患,因此数据清理对维护数据仓库和大型数据库极有价值。本课题介绍和研究数据清理的方法和技术.重点讨论消除中文重复数据的分析方法.并且对这些方法进行验证.分析和实现。  相似文献   

2.
探讨了某海量数据系统中实现统计分析的策略和方法,并结合某大型人口信息系统中联机分析处理技术的具体应用,提出了在数据仓库模式下统计分析系统通用的功能架构。文章还针对实际情况,提出了合理的数据存储实现模式,并对在线分析系统的实现策略和指标库维度设计和优化过程进行了探讨。这种设计实现了对海量数据进行灵活、方便的查询和统计这一最终应用目标,将系统蕴含的基础数据转化为决策知识,也为大规模数据统计分析处理提供了一套完整的解决方案.  相似文献   

3.
商务智能在现代企业中的应用与研究   总被引:6,自引:0,他引:6  
针对目前现代企业信息系统存在的问题,引入商务智能架构体系,提出了一种改进的商务智能体系结构,给出了一个完整的商务智能系统建设方案,并对其中的数据仓库主题分析的选择、ETCL过程实现以及数据仓库的查询优化进行了研究和实现。实际应用结果表明,该系统不仅提高了企业对现有的信息数据的利用效率,还提高了企业决策分析的能力。  相似文献   

4.
汪林林  焦慧敏 《计算机科学》2006,33(10):128-130
针对目前现代企业信息系统存在的问题,本文引入商务智能架构体系,提出了一种改进的商务智能体系结构,给出了一个完整的商务智能系统建设方案,并对其中的数据仓库的主题分析的选择、ETCL过程实现以及数据仓库的查询优化进行了研究和实现。实际应用中的结果表明,提高了企业对现有的信息数据的利用效率,提高了企业决策分析的能力。  相似文献   

5.
基于XML数据立方的面向对象扩展   总被引:3,自引:0,他引:3  
本文是对基于XML的数据立方数据模型的面向对象的实现,通过对数据仓库技术,面向对象技术和XML技术的结合的探讨,扩展了XML Schema,从而为基于WEB数据仓库的应用提供了一种新的表示和实现方法,解决了数据仓库中模式演化所带来的重组问题,通过与面向对象技术的结合使用,把数据仓库变成为数据立方和OLAP方法的仓库,这种数据仓库的实现方案,保证了数据仓库系统的稳定性,灵活性和可扩展性,适应了新一代WEB应用的需要。  相似文献   

6.
商业银行的数据规模随着传统业务扩展和互联网发展水平的不断提高而与日俱增,使得银行对数据的存储、管理和应用要求越来越高。通过搭建基于Hadoop技术的大数据平台,利用分布式文件系统HDFS、SQL分析引擎Inceptor、Nosql数据库工具Hyperbase、流处理工具Stream等架构,探索了大型商业银行Hadoop分布式数据仓库的构建过程,最终实现了由基于集中式存储架构的传统关系型数据仓库向分布式数据仓库的迁移工作。该分布式数据仓库实现了结构化数据和非结构化数据的存储、ETL调度管理、历史数据检索、交互式分析以及流数据处理。应用表明,相比基于集中式存储架构的传统关系型数据仓库,分布式数据仓库可大幅提高数据存储和数据服务的效率。  相似文献   

7.
并行技术在数据仓库中的应用   总被引:2,自引:0,他引:2  
由于庞大的细节数据的存储 ,使得数据仓库向TB级发展 ,加之许多复杂处理 ,所以使并行技术的实现成为必然。在分析了并行技术的基础上 ,认为采用基于无共享结构的MPP体系结构最适合于数据仓库的现状和发展趋势 ,并指出了利用并行技术设计大型数据仓库应注意的问题。  相似文献   

8.
针对现有的呼叫中心中存在历史数据量大、分析和处理数据能力不足、导致企业决策缺乏数据支持的问题,在对呼叫中心和数据仓库技术研究的基础上,结合呼叫中心的呼叫管理系统的特点,设计了呼叫中心数据仓库的体系结构,详细论述了对该数据仓库的体系结构、逻辑模型、物理模型以及联机分析处理(OLAP)系统的设计和具体实现方案。  相似文献   

9.
分析了移动网络优化的工作内容和工作流程,对网络中产生的海量数据进行了科学地分析与处理,提出了应用Web数据仓库和OLAP(联机分析处理)技术为手段构建GSM网络优化系统.详细说明了数据仓库的主题设计、维度和层次、度量指标和多维数据模型设计,利用SQL Server 2000实现了数据仓库的建设,并使用前端分析工具实现对数据的多维分析,从而实现为网优人员提供数据分析.  相似文献   

10.
随着我国经济和科技的不断发展,电信运营商的数据平台构建还存在着很大的缺陷,传统的技术手段已经无法适应社会发展的新要求,数据的整合难度和分类标准也愈加严格。针对这样的现象,很多企业在大型数据仓库创建的过程中都坚持应用了新技术,计算机也实现了进一步的应用,但是很多弊端仍旧无法避免。对此,采用最新的云计算措施来构建大型数据仓库平台就成为了大势所趋。本文就结合企业目前构建大型数据仓库平台的现状,简单分析一下其所面临的主要技术问题,进而结合云计算技术进行创新,提出行之有效的运用对策,发挥新技术的巨大优势,为我国大型数据仓库平台的云计算应用奠定坚实的基础。  相似文献   

11.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

12.
超大型压缩数据仓库上的CUBE算法   总被引:9,自引:2,他引:7  
高宏  李建中 《软件学报》2001,12(6):830-839
数据压缩是提高多维数据仓库性能的重要途径,联机分析处理是数据仓库上的主要应用,Cube操作是联机分析处理中最常用的操作之一.压缩多维数据仓库上的Cube算法的研究是数据库界面临的具有挑战性的重要任务.近年来,人们在Cube算法方面开展了大量工作,但却很少涉及多维数据仓库和压缩多维数据仓库.到目前为止,只有一篇论文提出了一种压缩多维数据仓库上的Cube算法.在深入研究压缩数据仓库上的Cube算法的基础上,提出了产生优化Cube计算计划的启发式算法和3个压缩多维数据仓库上的Cube算法.所提出的Cube算法直  相似文献   

13.
Many data warehouses contain massive amounts of data, accumulated over long periods of time. In some cases, it is necessary or desirable to either delete “old” data or to maintain the data at an aggregate level. This may be due to privacy concerns, in which case the data are aggregated to levels that ensure anonymity. Another reason is the desire to maintain a balance between the uses of data that change as the data age and the size of the data, thus avoiding overly large data warehouses. This paper presents effective techniques for data reduction that enable the gradual aggregation of detailed data as the data ages. With these techniques, data may be aggregated to higher levels as they age, enabling the maintenance of more compact, consolidated data and the compliance with privacy requirements. Special care is taken to avoid semantic problems in the aggregation process. The paper also describes the querying of the resulting data warehouses and an implementation strategy based on current database technology.  相似文献   

14.
本文通过示例说明数据仓库环境下实化视图维护存在的数据一致性问题,并分析了产生这一问题的根本原因.文中介绍了一些能解决数据一致性问题的具有代表性的实化视图维护算法,比较了它们之间的差异,最后描述了数据仓库环境下数据一致性程度的四个层次。  相似文献   

15.
The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.  相似文献   

16.
Incremental maintenance of data warehouses has attracted a lot of research attention for the past few years. Nevertheless, most of the previous work is confined to the relational setting. Recently, object-oriented data warehouses have been regarded as a better means to integrate data from modern heterogeneous data sources. However, existing approaches to incremental maintenance of data warehouses do not directly apply to object-oriented data warehouses. In this paper, therefore, we propose an approach to incremental maintenance of object-oriented data warehouses. We focus on two primary issues specifically. First, we identify six categories of potential updates to an object-oriented view and propose an algorithm to find potential updates from the definition of the view. Second, we propose an incremental view maintenance algorithm for maintaining object-oriented data warehouses. We have implemented a prototype system for incremental maintenance of object-oriented data warehouses. Performance evaluation has been conducted, which indicates that our approach is correct and efficient.  相似文献   

17.
实视图选择研究   总被引:1,自引:0,他引:1  
定义了数据仓库领域的视图选择问题,并讨论了与该问题相关的代价模型、收益函数、代价计算、约束条件和视图索引等内容;介绍了3大类视图选择方法,即静态方法、动态方法和混合方法,以及各类方法的代表性研究成果;最后展望未来的研究方向.  相似文献   

18.
New Algorithm for Computing Cube on Very Large Compressed Data Sets   总被引:2,自引:0,他引:2  
Data compression is an effective technique to improve the performance of data warehouses. Since cube operation represents the core of online analytical processing in data warehouses, it is a major challenge to develop efficient algorithms for computing cube on compressed data warehouses. To our knowledge, very few cube computation techniques have been proposed for compressed data warehouses to date in the literature. This paper presents a novel algorithm to compute cubes on compressed data warehouses. The algorithm operates directly on compressed data sets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube is also proposed  相似文献   

19.
A Taxonomy of Dirty Data   总被引:3,自引:0,他引:3  
Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining, and customer relationship management systems. A major problem that is only beginning to be recognized is that the data in data sources are often dirty. Broadly, dirty data include missing data, wrong data, and non-standard representations of the same data. The results of analyzing a database/data warehouse of dirty data can be damaging and at best be unreliable. In this paper, a comprehensive classification of dirty data is developed for use as a framework for understanding how dirty data arise, manifest themselves, and may be cleansed to ensure proper construction of data warehouses and accurate data analysis. The impact of dirty data on data mining is also explored.  相似文献   

20.
Real-world changes are generally discovered delayed by computer systems. The typical update patterns for traditional data warehouses on an overnight or even weekly basis increase this propagation delay until the information is available to knowledge workers. Typically, traditional data warehouses focus on summarized data (at some level) rather than detailed data.For active data warehouse environments, detailed data about entities is required for checking the data conditions and triggering actions to automize routine decision tasks. Hence, keeping data current (by minimizing the latency from when data is captured until it is available to knowledge workers) and consistent in that context is a difficult task.In this paper we present an approach for modeling conceptual time consistency problems and introduce a data model that deals with timely delays. It supports knowledge workers in finding out, why (or why not) an active system responded to acertain state of the data. Therefore, the model enables analytical processing of detailed data (enhanced by valid time) based on a knowledge state at a specific time. All states that were not yet known by the system at that point in time are consistently ignored. This enables timely consistent analyses by considering that the validity of detailed data and aggregates can be restricted to time intervals only, due to frequent updates and late-arriving information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号