首页 | 官方网站   微博 | 高级检索  
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   6篇
  完全免费   4篇
工业技术   10篇
  2020年   2篇
  2009年   2篇
  2008年   1篇
  2007年   1篇
  2006年   1篇
  2005年   1篇
  2004年   1篇
  2002年   1篇
排序方式: 共有10条查询结果,搜索用时 31 毫秒
Compressed Data Cube for Approximate OLAP Query Processing   总被引:4,自引:0,他引:4       下载免费PDF全文
Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling.  相似文献   
We consider the problem of using sampling to estimate the result of an aggregation operation over a subset-based SQL query, where a subquery is correlated to an outer query by a NOT EXISTS, NOT IN, EXISTS or IN clause. We design an unbiased estimator for our query and prove that it is indeed unbiased. We then provide a second, biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate. The two estimators are tested over an extensive set of experiments. Material in this paper is based upon work supported by the National Science Foundation via grants 0347408 and 0612170.  相似文献   
张安珍  李建中  高宏 《软件学报》2020,31(2):406-420
本文研究了基于符号语义的不完整数据聚集查询处理问题.不完整数据又称为缺失数据,缺失值包括可填充的和不可填充的两种类型.现有的缺失值填充算法不能保证填充后查询结果的准确度,为此,本文给出不完整数据聚集查询结果的区间估计.本文在符号语义中扩展传统关系数据库模型,提出一种通用不完整数据库模型,该模型可以处理可填充的和不可填充的两种类型缺失值.在该模型下,提出一种新的不完整数据聚集查询结果语义:可靠结果.可靠结果是真实查询结果的区间估计,可以保证真实查询结果很大概率在该估计区间范围内.本文给出线性时间求解SUM、COUNT和AVG查询可靠结果的方法.真实数据集和合成数据集上的扩展实验验证了本文所提方法的有效性.  相似文献   
实时交互式分析针对多目标和多角度的分析任务,通过多轮次的用户-数据库交互过程,逐步明确分析任务与分析目标,全方位地了解相关领域信息,最终得到科学的、全面的分析结果.相比传统数据库"提交查询-返回结果"的单轮次交互查询方式,实时交互式分析更强调交互的实时性与查询结果的时效性.对实时交互式分析的研究已成为近几年研究的热点.本文针对当前实时交互式分析面临的若干关键问题,对现有的实时交互式分析研究的理论基础、数据模型与系统构架进行了综述.  相似文献   
This paper studies aggregate search in transaction time databases. Specifically, each object in such a database can be modeled as a horizontal segment, whose y-projection is its search key, and its x-projection represents the period when the key was valid in history. Given a query timestamp q t and a key range , a count-query retrieves the number of objects that are alive at q t , and their keys fall in . We provide a method that accurately answers such queries, with error less than , where N alive(q t ) is the number of objects alive at time q t , and ɛ is any constant in (0, 1]. Denoting the disk page size as B, and nN / B, our technique requires O(n) space, processes any query in O(log B n) time, and supports each update in O(log B n) amortized I/Os. As demonstrated by extensive experiments, the proposed solutions guarantee query results with extremely high precision (median relative error below 5%), while consuming only a fraction of the space occupied by the existing approaches that promise precise results.  相似文献   
In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.  相似文献   
针对数据流上近似查询中的梗概计算,提出了一种新的基于最小误差的维压缩小波变换算法(MEDC).MEDC算法通过映射流数据时间戳,快速无冗余地维护流数据的有序性;基于最小误差,高效压缩小波变换阵列,最大化MEDC算法时间效率及近似查询实时处理能力;引入小波系数与查询准确度之间的数值性关联规则,支持小波系数梗概上的查询多级共享,整体查询执行性能最佳.实验表明,与传统小波变换、直方图和采样等算法相比,MEDC算法在数据流近似查询处理的响应速度、查询结果质量等方面具有更为优越的性能.  相似文献   
There is growing interest in algorithms for processing and querying continuous data streams (i.e., data seen only once in a fixed order) with limited memory resources. In its most general form, a data stream is actually an update stream, i.e., comprising data-item deletions as well as insertions. Such massive update streams arise naturally in several application domains (e.g., monitoring of large IP network installations or processing of retail-chain transactions). Estimating the cardinality of set expressions defined over several (possibly distributed) update streams is perhaps one of the most fundamental query classes of interest; as an example, such a query may ask what is the number of distinct IP source addresses seen in passing packets from both router R 1 and R 2 but not router R 3?. Earlier work only addressed very restricted forms of this problem, focusing solely on the special case of insert-only streams and specific operators (e.g., union). In this paper, we propose the first space-efficient algorithmic solution for estimating the cardinality of full-fledged set expressions over general update streams. Our estimation algorithms are probabilistic in nature and rely on a novel, hash-based synopsis data structure, termed 2-level hash sketch. We demonstrate how our 2-level hash sketch synopses can be used to provide low-error, high-confidence estimates for the cardinality of set expressions (including operators such as set union, intersection, and difference) over continuous update streams, using only space that is significantly sublinear in the sizes of the streaming input (multi-)sets. Furthermore, our estimators never require rescanning or resampling of past stream items, regardless of the number of deletions in the stream. We also present lower bounds for the problem, demonstrating that the space usage of our estimation algorithms is within small factors of the optimal. Finally, we propose an optimized, time-efficient stream synopsis (based on 2-level hash sketches) that provides similar, strong accuracy-space guarantees while requiring only guaranteed logarithmic maintenance time per update, thus making our methods applicable for truly rapid-rate data streams. Our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach.Received: 20 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein.Sumit Ganguly: sganguly@cse.iitk.ac.in Current affiliation: Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India  相似文献   
通过分析在线聚集与在线动态重排序技术,结合近似查询处理和国会抽样方法,提出了在线分组聚集方案,该方案具有广泛的应用前景。  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号