期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

周波《计算机学报》1999,22(6):620-626

为实现ＭＯＬＡＰ和ＲＯＬＡＰ的有机融合,达到较好的存储效率和操作效率,提出了一种基于密集区域的新的数据方体组织结构,给出了确定数据方体中密集区域的明确定义,分析了现有的相关算法的可行性,在此基础上,提出了一种在数据方体中寻找密集区的算法ＳｃａｎＣｈｕｎｋ同时分析了算法的计算精度和复杂度,并进行了详细的实验,结果表明,ＳｃａｎＣｈｕｎｋ在方体维数不超过ＳｃａｎＣｈｕｎｋ在方体维数不超过６的情况下是一相似文献

2.

Revisiting the cube lifecycle in the presence of hierarchies

Konstantinos Morfonios Yannis Ioannidis 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):257-282

On-line analytical processing (OLAP) typically involves complex aggregate queries over large datasets. The data cube has been proposed as a structure that materializes the results of such queries in order to accelerate OLAP. A significant fraction of the related work has been on Relational-OLAP (ROLAP) techniques, which are based on relational technology. Existing ROLAP cubing solutions mainly focus on “flat” datasets, which do not include hierarchies in their dimensions. Nevertheless, as shown in this paper, the nature of hierarchies introduces several complications into the entire lifecycle of a data cube including the operations of construction, storage, indexing, query processing, and incremental maintenance. This fact renders existing techniques essentially inapplicable in a significant number of real-world applications and mandates revisiting the entire cube lifecycle under the new perspective. In order to overcome this problem, the CURE algorithm has been recently proposed as an efficient mechanism to construct complete cubes over large datasets with arbitrary hierarchies and store them in a highly compressed format, compatible with the relational model. In this paper, we study the remaining phases in the cube lifecycle and introduce query-processing and incremental-maintenance algorithms for CURE cubes. These are significantly different from earlier approaches, which have been proposed for flat cubes constructed by other techniques and are inadequate for CURE due to its high compression rate and the presence of hierarchies. Our methods address issues such as cube indexing, query optimization, and lazy update policies. Especially regarding updates, such lazy approaches are applied for the first time on cubes. We demonstrate the effectiveness of CURE in all phases of the cube lifecycle through experiments on both real-world and synthetic datasets. Among the experimental results, we distinguish those that have made CURE the first ROLAP technique to complete the construction and usage of the cube of the highest-density dataset in the APB-1 benchmark (12 GB). CURE was in fact quite efficient on this, showing great promise with respect to the potential of the technique overall. 相似文献

3.

Approximate range–sum query answering on data cubes with probabilistic guarantees

Alfredo Cuzzocrea Wei Wang 《Journal of Intelligent Information Systems》2007,28(2):161-197

Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to tackle this problem. However, their effectiveness degrade when the underlying data distribution is skewed. Another approach based on the outlier management can limit the effect of data skews but fails to address other requirements of approximate range aggregate queries, such as error guarantees and query processing efficiency. In this paper, we present a technique that provides approximate answers to range aggregate queries on OLAP data cubes efficiently, with theoretical guarantees on the errors. Our basic idea is to build different data structures to manage outliers and the rest of the data. Carefully chosen outliers are organized in a quad-tree based indexing data structure to provide efficient access for query processing. A query-workload adaptive, tree-like synopsis data structure, called T unable P artition-Tree (TP-Tree), is proposed to organize samples extracted from non-outlier data. Our experiments clearly demonstrate the merits of our technique, by comparing with previous well-known techniques. 相似文献

4.

Extending the data warehouse for service provisioning data

《Data & Knowledge Engineering》2007,60(3):700-724

相似文献

5.

Improving the performance and functionality of Mondrian open‐source OLAP systems

Pablo Sendín‐Raña Francisco J. González‐Castaño Enrique Pérez‐Barros Pedro S. Rodríguez‐Hernández Felipe Gil‐Castiñeira José M. Pousada‐Carballo 《Software》2009,39(3):279-298

For a long time, the design of relational databases has focused on the optimization of atomic transactions (insert, select, update or delete). Currently, relational databases store tactical information of data warehouses, mainly for select‐like operations. However, the database paradigm has evolved, and nowadays on‐line analytical processing (OLAP) systems handle strategic information for further analysis. These systems enable fast, interactive and consistent information analysis of data warehouses, including shared calculations and allocations. OLAP and data warehouses jointly allow multidimensional data views, turning raw data into knowledge. OLAP allows ‘slice and dice’ navigation and a top‐down perspective of data hierarchies. In this paper, we describe our experience in the migration from a large relational database management system to an OLAP system on top of a relational layer (the data warehouse), and the resulting contributions in open‐source ROLAP optimization. Existing open‐source ROLAP technologies rely on summarized tables with materialized aggregate views to improve system performance (in terms of response time). The design and maintenance of those tables are cumbersome. Instead, we intensively exploit cache memory, where key data reside, yielding low response times. A cold start process brings summarized data from the relational database to cache memory, subsequently reducing the response time. We ensure concurrent access to the summarized data, as well as consistency when the relational database updates data. We also improve the OLAP functionality, by providing new features for automating the creation of calculated members. This makes it possible to define new measures on the fly using virtual dimensions, without re‐designing the multidimensional cube. We have chosen the XML/A de facto standard for service provision. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

6.

提高0LAP系统性能的方法研究 总被引：3，自引：0，他引：3

辛志刘少辉史忠植《计算机科学》2003,30(5):59-62

1.引言随着市场竞争的日趋激烈,近年来企业更加强调决策的及时性和准确性,这使得以支持决策管理分析为主要目的的应用迅速崛起,这类应用被称为联机分析处理(OLAP),OLAP应用主要是指通过各种即席复杂查询,对数据仓库中存储的数据进行各种统计分析的应用。由于各种OLAP查询涉及大量的数据,非常复杂,并要求比较快的响应速度,因此除了研究如何形式化描述OLAP查询与操作外,还需要研究各种OLAP查询和处理的有效方法。本文首先介绍OLAP的概念和分类,然后研究了提高OLAF系统应用性能的主要的技术。文章的第四部分分析ROLAP和MOLAP两种方式的优缺点,给出了一个结合ROLAP和MOLAP的查询解决方案。最后给出结论和进一步的研究方向。相似文献

7.

Star Cube--一种高效的数据立方体实现方法 总被引：3，自引：2，他引：1

李盛恩王珊《计算机研究与发展》2004,41(4):587-593

一个具有n个维的数据立方体有2^n个视图,视图越多,用于维护数据立方体的时间也就越长。通过将维分成划分维和非划分维,数据立方体可以转换成star cube．stal cube由一个综合表和那些仅包含划分维的视图组成。star cube使用前缀共享和元组共享技术不仅减少了所需的存储空间,还大大减少了计算和维护时间。在把一个分片限制在一个I/O单位的条件下,star cube的查询响应时间与数据立方体基本相同。实验结果也表明,star cube是一种在时空两方面均有效的数据立方体实现技术。相似文献

8.

数据仓库系统中层次式Cube存储结构 总被引：11，自引：0，他引：11

下载免费PDF全文

高宏李建中李金宝《软件学报》2003,14(7):1258-1266

区域查询是数据仓库上支持联机分析处理(on-line analytical processing,简称OLAP)的重要操作.近几年,人们提出了一些支持区域查询和数据更新的Cube存储结构.然而这些存储结构的空间复杂性和时间复杂性都很高,难以在实际中使用.为此,提出了一种层次式Cube存储结构HDC(hierarchical data cube)及其上的相关算法.HDC上区域查询的代价和数据更新代价均为O(log^dn),综合性能为O((logn)^2d)(使用C_qC_u模型)或O(K(logn)^d)(使用C_qn_q+C_un_u模型).理论分析与实验表明,HDC的区域查询代价、数据更新代价、空间代价以及综合性能都优于目前所有的Cube存储结构. 相似文献

9.

一个面向OLAP的多维层次聚簇存储模式

袁霖邹恒明李战怀《计算机科学》2007,34(9):110-113

文献[2]针对ROLAP提出的多维层次聚簇存储模式（MHC），极大地提高了查询效率。然而与ROLAP相比，MOLAP往往具有更高的存储效率和查询效率。这让人自然地联想到：如果能构造一个集二者优点为一身的混合型OLAP系统，以实现MHC，也许能进一步提高系统性能。作为这一设想的探索性研究，本文利用ORDBMS的可扩展性实现了这一原形系统：多维数据按维层次分块聚簇，其中每个分块以数组ADT存储，分块间以B^＋树索引聚簇。实验表明，本文提出的MHC实现能有效减少存储空间，进一步提高查询性能。相似文献

10.

一种基于维层次编码的OLAP聚集查询算法 总被引：8，自引：2，他引：8

胡孔法董逸生徐立臻杨科华《计算机研究与发展》2004,41(4):608-614

联机分析处理(OLAP)查询往往需在海量数据上进行即席的复杂分组聚集查询，在其SQL语句中通常包含多表连接和分组聚集操作，因而减少多表连接和压缩关键字，以及对查询数据进行有效地分组聚集操作，成为ROLAP查询处理的关键问题。提出了一种基于维层次编码的新型预分组聚集算法DHEPGA．DHEPGA算法充分利用了编码长度较小的维层次编码及其前缀，来快速检索出与查询关键字相匹配的维层次编码，求得维层次属性的查询范围，减少了I／O开销，提高了OLAP查询效率。理论分析和实验结果表明，DHEPGA算法性能是非常有效的。相似文献

11.

Optimizing in-network aggregate queries in wireless sensor networks for energy saving

Chih-Chieh HungAuthor VitaeWen-Chih PengAuthor Vitae 《Data & Knowledge Engineering》2011,70(7):617-641

This study proposes a method of in-network aggregate query processing to reduce the number of messages incurred in a wireless sensor network. When aggregate queries are issued to the resource-constrained wireless sensor network, it is important to efficiently perform these queries. Given a set of multiple aggregate queries, the proposed approach shares intermediate results among queries to reduce the number of messages. When the sink receives multiple queries, it should be propagated these queries to a wireless sensor network via existing routing protocols. The sink could obtain the corresponding topology of queries and views each query as a query tree. With a set of query trees collected at the sink, it is necessary to determine a set of backbones that share intermediate results with other query trees (called non-backbones). First, it is necessary to formulate the objective cost function for backbones and non-backbones. Using this objective cost function, it is possible to derive a reduction graph that reveals possible cases of sharing intermediate results among query trees. Using the reduction graph, this study first proposes a heuristic algorithm BM (standing for Backbone Mapping). This study also develops algorithm OOB (standing for Obtaining Optimal Backbones) that exploits a branch-and-bound strategy to obtain the optimal solution efficiently. This study tests the performance of these algorithms on both synthesis and real datasets. Experimental results show that by sharing the intermediate results, the BM and OOB algorithms significantly reduce the total number of messages incurred by multiple aggregate queries, thereby extending the lifetime of sensor networks. 相似文献

12.

港口数据MOLAP中层次维的存储

下载免费PDF全文

潘明霞王宇《计算机工程与应用》2010,46(3):211-214

从港口企业面临的决策需求出发,分析港口现有数据来构建港口数据立方体。多维联机分析处理（MOLAP）是在数据立方体上进行的应用查询,支持维层次是MOLAP的一个重要特征,一般层次维是以数组形式进行存储的,但是数组存储不仅不能体现维的层次特征,还使得数据单元出现冗余。针对数组存储的不足,采用维层次存储树来保存层次维信息,体现了维的层次特性,消除了冗余数据,方便层次维的查询和更新,且各层维成员采用二进制编码方式,不仅节省了存储空间,还提高了查询效率。相似文献

13.

A Genetic Selection Algorithm for OLAP Data Cubes 总被引：1，自引：0，他引：1

Wen-Yang?Lin Email author I-Chung?Kuo 《Knowledge and Information Systems》2004,6(1):83-102

Multidimensional data analysis, as supported by OLAP (online analytical processing) systems, requires the computation of many aggregate functions over a large volume of historically collected data. To decrease the query time and to provide various viewpoints for the analysts, these data are usually organized as a multidimensional data model, called data cubes. Each cell in a data cube corresponds to a unique set of values for the different dimensions and contains the metric of interest. The data cube selection problem is, given the set of user queries and a storage space constraint, to select a set of materialized cubes from the data cubes to minimize the query cost and/or the maintenance cost. This problem is known to be an NP-hard problem. In this study, we examined the application of genetic algorithms to the cube selection problem. We proposed a greedy-repaired genetic algorithm, called the genetic greedy method. According to our experiments, the solution obtained by our genetic greedy method is superior to that found using the traditional greedy method. That is, within the same storage constraint, the solution can greatly reduce the amount of query cost as well as the cube maintenance cost. 相似文献

14.

基于聚集区域的强关联OLAP查询解决方案设计

钱琼雷霖脱建勇刘文煌《计算机工程与应用》2003,39(36):193-196,226

聚集区域的OLAP查询解决方案是基于对数据仓库中数据立方体聚集区域认识的基础上提出的。它结合MO-LAP和ROLAP的特点,能够有效地识别出数据立方体中的聚集区域和稀疏点,通过不同的存储方式提高查询效率。基于上述思想,该文还提出了聚集区域查询(DSR)算法和严格约束的聚集区域查询(SDSR)算法,并对两种算法进行了仿真比较。相似文献

15.

Compiling Source Descriptions for Efficient and Flexible Information Integration

José Luis Ambite Craig A. Knoblock Ion Muslea Andrew G. Philpot 《Journal of Intelligent Information Systems》2001,16(2):149-187

相似文献

16.

PMC: Select Materialized Cells in Data Cubes 总被引：1，自引：0，他引：1

下载免费PDF全文

Hong-Song Li Hou-Kuan Huang 《计算机科学技术学报》2006,21(2):297-305

QC-Tree is one of the most storage-efficient structures for data cubes in an MOLAP system. Although QC-Tree can achieve a high compression ratio, it is still a fully materialized data cube. In this paper, an improved structure PMC is presented allowing us to materialize only a part of the cells in a QC-Tree to save more storage space. There is a notable difference between our partially materialization algorithm and traditional materialized views selection algorithms. In a traditional algorithm, when a view is selected, all the cells in this view are to be materialized. Otherwise, if a view is not selected, all the cells in this view will not be materialized. This strategy results in the unstable query performance. The presented algorithm, however, selects and materializes data in cell level, and, along with further reduced space and update cost, it can ensure a stable query performance. A series of experiments are conducted on both synthetic and real data sets. The results show that PMC can further reduce storage space occupied by the data cube, and can shorten the time to update the cube. 相似文献

17.

基于关系数据库构造多维数据模型 总被引：8，自引：0，他引：8

刘义李亮《计算机工程》2000,26(9):21-22,114

该文重点研究基于关系型数据库如何组织数据以满足多维分析的需要,即如何对星形模式的维表进行分割。分解成多个具有层次关系的,标准化的,低粒度的维表,以易于实而下（ｄｒｉｌｌｄｏｗｎ）及由下而上（ｒｏｌｌｕｐ）的数据挖掘（ｄａｔａｍｉｎｉｎｇ）。相似文献

18.

一种高度浓缩和语义保持的数据立方

向隆刚龚健雅《计算机研究与发展》2007,44(5):837-844

Quotient Cube和QC-tree试图在浓缩一个数据立方尺寸的同时,保持该数据立方蕴涵的语义,但是,前者没有语义关系的存储,后者存储的语义关系是晦涩模糊的.为此提出了下钻立方结构,首次从语义角度考虑数据立方存储,存储的不是类的内容,而是类之间的直接下钻关系.下钻立方不仅能够极大地减小数据立方的存储尺寸,而且可以清晰地表达原数据立方蕴涵的下钻语义.此外,下钻立方具有较高的查询响应性能,这一点在范围查询中表现得尤其显著.实验和分析表明,下钻立方在存储尺寸和查询响应方面明显优于QC-tree,适于用来组织和存储数据立方. 相似文献

19.

Accuracy vs. Lifetime: Linear Sketches for Aggregate Queries in Sensor Networks

Vasundhara Puttagunta Konstantinos Kalpakis 《Algorithmica》2007,49(4):357-385

The in–network aggregation paradigm in sensor networks provides a versatile approach for evaluating aggregate queries. Traditional approaches need a separate aggregate to be computed and communicated for each query and hence do not scale well with the number of queries. Since approximate query results are sufficient for many applications, we use an alternate approach based on summary data–structures. We consider two kinds of aggregate queries: location range queries that compute the sum of values reported by sensors in a given location range, and value range queries that compute the number of sensors that report values in a given range. We construct summary data–structures called linear sketches, over the sensor data using in–network aggregation and use them to answer aggregate queries in an approximate manner at the base–station. There is a trade–off between accuracy of the query results and lifetime of the sensor network that can be exploited to achieve increased lifetimes for a small loss in accuracy. Most commonly occurring sets of range queries are highly correlated and display rich algebraic structure. Our approach takes full advantage of this by constructing linear sketches that depend on queries. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks for only a small loss in accuracy of the queries. Further, our approach achieves more accurate query results than the other classical techniques using Discrete Fourier Transform and Discrete Wavelet Transform. This work was supported in part by NASA under Cooperative Agreement NCC5–315. 相似文献

20.

Progressive evaluation of nested aggregate queries

Kian-Lee Tan Cheng Hian Goh Beng Chin Ooi 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):261-278

In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000 相似文献