共查询到18条相似文献,搜索用时 203 毫秒
1.
为了提高冰山立方体的计算性能,本文提出一种基于位图索引改进的DPBUC_BI (Dynamic Pruning based BUC_BI)算法。该算法利用位图索引按列组织的特性重新定义BUC(Bottom-Up Computation)算法的分组操作,加快了数据的加载和查询;通过使用逻辑位运算实现聚合计算,提高了算法的计算性能。针对部分数据聚集现象增加动态剪枝策略,在保证算法正确性的情况下进一步提高了冰山立方体计算性能。最后将DPBUC_BI算法应用于机票结算数据的冰山立方体计算中,实验结果表明:该算法可以很好地提升计算性能,相对于经典BUC算法在时间性能上有一定的提高。 相似文献
2.
3.
随着原始数据记录数的增多,数据立方体在存储空间和计算时间上的消耗都越来越大,封闭立方体是减少数据立方体的存储空间的有效手段。提出一种新的封闭数据立方体的生成算法,针对大量的原始数据集,通过预处理,采用类似BUC算法的计算顺序自上而下递归输出封闭单元,使用实际数据做了相关研究的实验,实验结果表明该算法能有效提高生成速度。 相似文献
4.
封闭数据立方体技术研究 总被引:14,自引:1,他引:14
数据立方体中有很多冗余信息,去除这些冗余信息不但可以节约存储空间,还可以加快计算速度.数据立方体中的元组可以划分为封闭元组和非封闭元组.对任何一个非封闭元组,一定存在一个封闭元组,它们都是从基本表的同一组元组中经过聚集运算得到的,因而具有相同的聚集函数值.去掉数据立方体中所有的非封闭元组就产生了一个封闭数据立方体.提出了封闭数据立方体的生成算法、查询算法和增量维护算法,并使用合成数据和实际数据做了一些实验.实验结果表明,封闭数据立方体技术是有效的. 相似文献
5.
Star Cube--一种高效的数据立方体实现方法 总被引:3,自引:2,他引:1
一个具有n个维的数据立方体有2^n个视图,视图越多,用于维护数据立方体的时间也就越长。通过将维分成划分维和非划分维,数据立方体可以转换成star cube.stal cube由一个综合表和那些仅包含划分维的视图组成。star cube使用前缀共享和元组共享技术不仅减少了所需的存储空间,还大大减少了计算和维护时间。在把一个分片限制在一个I/O单位的条件下,star cube的查询响应时间与数据立方体基本相同。实验结果也表明,star cube是一种在时空两方面均有效的数据立方体实现技术。 相似文献
6.
7.
数据立方体和频繁项集挖掘分别是数据仓库和数据挖掘领域的重要技术,已开展了大量的相关研究工作,取得了较好的进展.数据立方体和频繁项集挖掘依据各自的数据单元和项集构造了类似的代数格(Lattice)结构;数据立方体的等价类上界单元与频繁项集挖掘的闭项集也是相对应的.如果能够论证二者的统一性,则可以为彼此提供更广泛的研究思路,有利于两种技术的相互促进,如:在数据库中利用冰山立方体计算实现频繁项集挖掘来避免数据迁移、利用频繁项集挖掘算法优化数据立方体计算等.之前的工作没有将二者系统地结合起来研究,也没有建立二者之间较为完整的联系.本文在深入研究数据立方体的计算和频繁项集挖掘的过程后,将二者有效地结合在一起,提出了统一的计算框架,给出了二者众多计算性质和方法之间的映射关系,进行了相关概念泛化,具体地建立了冰山立方体、浓缩立方体和商立方体等主要数据立方体计算与相应频繁项集挖掘方法的对应关系.通过算法和实验进一步论证统一计算的有效性:(1)将频繁项集挖掘事务集导入关系数据库,用冰山立方体计算方式进行频繁项集挖掘,从而在数据库中用标准的或扩展的SQL可以实现对关系表进行频繁项集挖掘;(2)验证了浓缩立... 相似文献
8.
一种并行处理多维连接和聚集操作的有效方法 总被引:1,自引:0,他引:1
随着并行计算算法的完善和廉价、功能强大的多处理机系统的成熟,使得采用多处理机系统来并行处理多维数据仓库的连接和聚集操作成为当前有效提高OLAP查询处理性能的首选技术.为此,提出一种降低连接和聚集操作开销的并行算法PJAMDDC(parallel join and aggregation for multi-dimensional data cube).算法充分考虑了多维数据立方体的存储机制和多处理机分布系统的结构特点,在原有聚集计算多维数据立方体的搜索点阵逻辑结构的基础上,采用多维数据仓库的层次联合代理(hierarchy combined surrogate)和对立方体的搜索点阵进行加权的方法,使得立方体数据在多个处理机间的分配达到最佳的状态,从而在分割多维数据的同时,提高了并行处理多维连接和聚集操作的效率.算法实验评估表明,PJAMDDC算法并行处理多维数据仓库的连接和聚集操作是有效的. 相似文献
9.
基于iceberg概念格并置集成的闭频繁项集挖掘算法 总被引:2,自引:0,他引:2
由于概念格的完备性,在基于概念格的数据挖掘过程中,构造概念格的时间复杂度和空间复杂度一直是影响其应用的主要因素.结合iceberg概念格的半格特性和概念格的集成思想,首先在理论上分析并置集成后的iceberg概念格与由完备概念格裁剪得到的iceberg格同构;然后分析了iceberg概念格集成过程中的映射关系;最终提出一个新颖的基于iceberg概念格并置的闭频繁项集挖掘算法(Icegalamera).此算法避免了完备概念格的计算,并且在构造过程中采用集成和剪枝策略,从而显著提高了挖掘效率.实验证明其产生的闭频繁项集的完备性.使用稠密和稀疏数据集在单站点模式下进行了性能测试,结果表明稀疏数据集上性能优势明显. 相似文献
10.
多路数组聚集技术主要用于以高维数组为基础数据结构的数据立方体的完全立方体计算。随着大数据时代的到来,对数据高效的分析利用显得尤为重要,但同时海量的数据又使得计算变得富有挑战性。在这种背景下,论述了一种将大数据时代普遍使用的MapReduce技术与多路数组聚集技术相结合的方法,以提高完全立方体计算的效率。 相似文献
11.
Xing Li Howard J. Hamilton Kamran Karimi Liqiang Geng 《Journal of Intelligent Information Systems》2009,33(2):179-208
The computation of data cubes is one of the most expensive operations in on-line analytical processing (OLAP). To improve
efficiency, an iceberg cube represents only the cells whose aggregate values are above a given threshold (minimum support).
Top-down and bottom-up approaches are used to compute the iceberg cube for a data set, but both have performance limitations.
In this paper, a new algorithm, called Multi-Tree Cubing (MTC), is proposed for computing an iceberg cube. The Multi-Tree
Cubing algorithm is an integrated top-down and bottom-up approach. Overall control is handled in a top-down manner, so MTC
features shared computation. By processing the orderings in the opposite order from the Top-Down Computation algorithm, the
MTC algorithm is able to prune attributes. The Bottom Up Computation (BUC) algorithm and its variations also perform pruning
by relying on the processing of intermediate partitions. The MTC algorithm, however, prunes without processing such partitions.
The MTC algorithm is based on a specialized type of prefix tree data structure, called an Attribute–Partition tree (AP-tree),
consisting of attribute and partition nodes. The AP-tree facilitates fast, in-memory sorting and APRIORI-like pruning. We
report on five series of experiments, which confirm that MTC is consistently as fast or faster than BUC, while finding the
same iceberg cubes. 相似文献
12.
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach 总被引:1,自引:0,他引:1
Dong Xin Jiawei Han Xiaolei Li Zheng Shao Wah B.W. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(1):111-126
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the multiway array cube (called the multiway) algorithm, aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC, computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, star-cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that star-cubing is highly efficient and outperforms the previous methods 相似文献
13.
《Knowledge and Data Engineering, IEEE Transactions on》2007,19(7):903-918
The iceberg cubing problem is to compute the multidimensional group-by partitions that satisfy given aggregation constraints. Pruning unproductive computation for iceberg cubing when nonantimonotone constraints are present is a great challenge because the aggregate functions do not increase or decrease monotonically along the subset relationship between partitions. In this paper, we propose a novel bound prune cubing (BP-Cubing) approach for iceberg cubing with nonantimonotone aggregation constraints. Given a cube over n dimensions, an aggregate for any group-by partition can be computed from aggregates for the most specific n--dimensional partitions (MSPs). The largest and smallest aggregate values computed this way become the bounds for all partitions in the cube. We provide efficient methods to compute tight bounds for base aggregate functions and, more interestingly, arithmetic expressions thereof, from bounds of aggregates over the MSPs. Our methods produce tighter bounds than those obtained by previous approaches. We present iceberg cubing algorithms that combine bounding with efficient aggregation strategies. Our experiments on real-world and artificial benchmark data sets demonstrate that BP-Cubing algorithms achieve more effective pruning and are several times faster than state-of-the-art iceberg cubing algorithms and that BP-Cubing achieves the best performance with the top-down cubing approach. 相似文献
14.
Cui-PingLi Kum-HoeTung ShanWang 《计算机科学技术学报》2004,19(3):0-0
Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases. 相似文献
15.
Efficient Incremental Maintenance for Distributive and Non-Distributive Aggregate Functions 下载免费PDF全文
Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases. 相似文献
16.
This paper proposes a computation method for holistic multi-feature cube (MF-Cube) queries based on the characteristics of
MF-Cubes. Three simple yet efficient strategies are designed to optimize the dependent complex aggregate at multiple granularities
for a complex data-mining query within data cubes. One strategy is the computation of Holistic MF-Cube queries using the PDAP
(Part Distributive Aggregate Property). More efficiency is gained by another strategy, that of dynamic subset data selection
(the iceberg query technique), which reduces the size of the materialized data cubes. To extend this efficiency further, the
second approach may adopt the chunk-based caching technique that reuses the output of previous queries. By combining these
three strategies, we design an algorithm called the PDIC (Part Distributive Iceberg Chunk). We experimentally evaluate this
algorithm using synthetic and real-world datasets and demonstrate that our approach delivers up to approximately twice the
performance efficiency of traditional computation methods. 相似文献
17.
Hongjun Lu Jeffrey Xu Yu Ling Feng Zhixian Li 《Distributed and Parallel Databases》2003,13(2):181-202
Parallel data processing is a promising approach for efficiently computing data cube in relational databases, because most aggregate functions used in OLAP (On-Line Analytical Processing) are distributive functions. This paper studies the issues of handling data skew in parallel data cube computation. We present a fully dynamic partitioning approach that can effectively distribute workload among processing nodes without priori knowledge of data distribution. As supplement, a simple and effective dynamic load balancing mechanism is also incorporated into our algorithm, which further improves the overall performance. Our experimental results indicated that the proposed techniques are effective even when high data skew exists. The results of scale-up and speedup tests are also satisfactory. 相似文献