期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李阳高鹏马骏《计算机工程与设计》2007,28(18):4325-4328,4332

谓词选择率估计是关系数据库管理系统查询优化器决策的重要依据.提出了一种基于压缩直方图的谓词选择率估计方法.采用基于MCV和等高直方图的压缩直方图存储数据库的数据分布特征信息,给出了该压缩直方图的构建方法,研究了谓词选择率估计算法.该方法的有效性已经在实践中得到证明,能够取得准确的选择率估计结果,同时具有较低的构建代价. 相似文献

2.

空间数据库中距离连接选择率估计方法研究 总被引：1，自引：0，他引：1

熊伟廖巍陈宏盛景宁《计算机学报》2006,29(1):45-53

通过综合分析和比较现有的选择率估计技术，提出了基于MBR缓冲区和直方图的距离连接估计方法，该方法基于空间对象的MBR缓冲区，只需要简单计算几个直方图统计量，就可以快速得到较为准确的估计结果．针对MBR缓冲区方法估计结果偏大的不足，提出一种利用线段分布特性基本定律实现距离连接选择率估计的新方法．该方法基于线段缓冲区，不需要遍历线段数据集，提高了估计效率，减少了估计结果的相对误差．实验结果证明，该文方法能够准确有效地进行空间距离查询选择率估计并且更加适合真实的数据集．相似文献

3.

基于直方图的空间查询选择率估计研究

朱焰炉程昌秀陈荣国颜勋《计算机科学》2010,37(12):125-129

空间查询优化是空间数据库中的关键问题之一,以查询代价估算为基础的查询优化技术是提高查询效率的一种重要方法,而估算代价的主要问题是估算查询结果(选择率)的大小。针对空间数据库中最常用的两种查询—空间选择和空间连接,阐述了几种主要用于查询选择率佑计的直方图算法,并对各算法的优缺点做了分析,最后对空间查询选择率估计的研究方向进行了展望。相似文献

4.

面向对象数据库系统中的谓词选择率估算 总被引：1，自引：0，他引：1

王国仁于戈张斌郑怀远《计算机学报》1998,21(Z1):171-177

在数据库系统的查询优化中选择率的估算是一个非常重要的问题.这篇文章采用参数化均匀分布法,提出了一套基于数据库统计信息的完整的面向对象数据库系统查询优化中的选择率的估算方法,主要包括复杂嵌套谓词(WHERE子句中的路径表达式)的选择率估算. 相似文献

5.

直方图下数据连接操作代价估计应用分析

冯凯平陈衡冯超颖《计算机系统应用》2012,21(10):194-197

直方图是一种重要的数据查询工具.在数据库操作中查询操作需要进行代价估计,而操作代价的估计有多种方法.直方图估计法在所有估计法中是最容易使用和最容易维护的,它将关系中的全部元组进行归类设桶,对每个桶分别进行大小统计,继而求和,使代价统计值更接近于真实.结合实际课题应用,给出了将关系的连接转换为多种直方图的方法,以及如何利用直方图进行代价估计. 相似文献

6.

压缩数据库中一种自适应直方图的构建 总被引：1，自引：0，他引：1

骆吉洲李建中王宏志《软件学报》2009,20(7):1785-1799

直方图在查询优化过程中起着重要作用.在压缩数据库中利用查询处理的特点构建自适应直方图以便于查询优化或近似回答查询是尚待解决的问题.通过对查询缓冲池内的查询进行调度来追踪热点数据,并用查询结果中的反馈信息构建自适应直方图以加快自适应直方图的收敛速度.另外,还提出一种参数化方法来估计未被任何桶覆盖的区域中元组的个数.该直方图可以增量式地被维护.实验结果表明,这种直方图具有良好的平均精度、更快的收敛速度和更强的自适应能力. 相似文献

7.

基于压缩直方图的劣质数据库上相似连接结果大小估计

张岩杨忠胜王宏志高宏李建中《小型微型计算机系统》2012,(10):2113-2120

现代数据管理系统普遍存在劣质数据,影响了数据质量,给数据管理带来了新的挑战.已经有不少管理劣质数据的数据模型,实体关系数据模型就是其中一种,该模型允许劣质数据的存在,并给出衡量数据质量的方法,并且可根据对结果质量的需求给出查询结果.鉴于该模型的特点,传统的估计查询代价的优化方法很难再适用,需要新的代价估计技术.本文提出了一种新的估计连接结果大小的方法.使用加权的最小哈希函数获得某一属性的最小哈希签名,这使得属性具有相同维数,便于利用直方图进行快速估计;然后建立其直方图,最后使用改进的离散余弦变换压缩直方图信息,使用压缩信息直接进行代价估计,这使得即使对于高维数据也能保证低错误率和低存储代价.此外,此方法可以很好的支持动态数据更新,消除周期性重建直方图的时间开销. 相似文献

8.

面向对象XML数据查询的代价估计研究 总被引：2，自引：0，他引：2

下载免费PDF全文

张晓琳戴华忠《计算机工程与应用》2007,43(18):181-183

由于XML具有丰富的表达能力、自描述性和灵活性等优点,而面向对象的概念又具有很强的建模能力,将面向对象的概念引入到XML可以提高XML模式语言的建模能力。而面向对象XML数据查询的值匹配条件的查询代价估计问题是一种典型的多元素查询条件的代价估计问题。XML数据的值分布与其他值信息的分布有关,还与XML数据的结构信息有关,很难使用某种单一的代价估计方法。针对以上问题,提出了一种基于直方图,在估计过程中结合查询树结构的代价估计方法。相似文献

9.

XML数据查询中值匹配查询代价估计算法 总被引：6，自引：0，他引：6

曲卫民孙乐孙玉芳《软件学报》2005,16(4):561-569

XML数据查询中值匹配查询条件的查询代价估计问题是一种典型的多元素查询条件代价估计问题.它与传统关系型数据库中的多元素查询条件不同,因为XML数据中的值信息分布不仅与其他值信息分布相关,还与XML数据中的结构信息相关,而且当XML数据结构比较复杂时,可能会形成高维元素相关.针对以上问题,提出了一种面向XML数据的基于小波的多维直方图查询代价估计算法,并提出了确定XML数据中以某值元素为主键的相互依赖元组的方法,将值匹配条件改写为多元素查询条件的方法以及结构信息的值化方法.实验结果证明,提出的方法取得了较准确的查询代价估计结果. 相似文献

10.

面向任意区间树结构的差分隐私直方图发布算法^*

吴英杰陈鸿王一蕾孙岚《模式识别与人工智能》2015,28(12):1084-1092

当前一种有效的差分隐私直方图发布是先将直方图映射成满m叉区间树,后利用查询一致性约束提高查询精度.然而,并非所有直方图都能映射成满m叉区间树.针对此问题,文中首先提出可实现任意直方图向树结构映射的k-区间树;然后从理论上分析对于任意区间树结构下的差分隐私直方图发布,仍可在一致性约束下利用最优线性无偏估计进一步降低区间计数查询的误差;最后提出面向任意区间树结构基于局部最优线性无偏估计的差分隐私直方图发布算法(LBLUE).实验对比分析同类算法和LBLUE所发布数据的区间计数查询精度及算法效率,表明LBLUE有效可行. 相似文献

11.

空间查询优化 总被引：4，自引：1，他引：4

蒋苏蓉石青青黄志良《计算机工程与应用》2004,40(9):188-190

由于空间数据的复杂性,空间查询需要建立自己的代价模型。该文首先介绍了建立四叉树直方图来对空间查询的选择性进行估计,然后在此基础上对DM-SDB的查询代价进行估计,并使用该代价模型对DM-SDB的多连接查询进行优化。相似文献

12.

HEDC＋＋： An Extended Histogram Estimator for Data in the Cloud

下载免费PDF全文

史英杰孟小峰 Pusheng Wang 干艳桃《计算机科学技术学报》2013,28(6):973-988

With increasing popularity of cloud-based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional databases to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plans, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct an exact histogram on massive data, building an approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC＋＋, an extended histogram estimator for data in the cloud, which provides efficient approximation approaches for both equi-width and equi-depth histograms. We design the histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC＋＋ can provide promising histogram estimate for massive data in the cloud. 相似文献

13.

基于直方图的XPath含值谓词路径选择性代价估计

王宇孟小峰王珊《计算机研究与发展》2006,43(2):288-294

路径选择性代价估计是XML查询优化的基础，也是研究的热点．目前的方法采用大量正态分布和独立性分布假设是造成误差的根本原因．定义了一种新颖的值一位置直方图用于统计XML数据中的结构和值的分布情况，并提出了6种直方图运算．在此基础上，给出用直方图计算估计路径中任一结，最选择性的方法．实验证明，这种方法无需独立性分布假设，也能在数据结构和数值分布不均匀的情况下。精确地估计路径选择性代价．相似文献

14.

Efficiently adapting graphical models for selectivity estimation

Kostas Tzoumas Amol Deshpande Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2013,22(1):3-27

Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers. 相似文献

15.

Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation

Zhen He Author Vitae Byung Suk Lee^{Author Vitae} 《Journal of Systems and Software》2008,81(3):414-430

Many state-of-the-art selectivity estimation methods use query feedback to maintain histogram buckets, thereby using the limited memory efficiently. However, they are “reactive” in nature, that is, they update the histogram based on queries that have come to the system in the past for evaluation. In some applications, future occurrences of certain queries may be predicted and a “proactive” approach can bring much needed performance gain, especially when combined with the reactive approach. For these applications, this paper provides a method that builds customized proactive histograms based on query prediction and mergers them into reactive histograms when the predicted future arrives. Thus, the method is called the proactive and reactive histogram (PRHist). Two factors affect the usefulness of the proactive histograms and are dealt with during the merge process: the first is the predictability of queries and the second is the extent of data updates. PRHist adjusts itself to be more reactive or more proactive depending on these two factors. Through extensive experiments using both real and synthetic data and query sets, this paper shows that in most cases, PRHist outperforms STHoles, the state-of-the-art reactive method, even when only a small portion of the queries are predictable and a significant portion of data is updated. 相似文献

16.

Histograms based on the minimum description length principle

Hai Wang Kenneth C. Sevcik 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):419-442

相似文献

17.

查询优化代价估计中的直方图方法运用

冯凯平张华冯超颖陈衡《计算机与数字工程》2012,40(6):27-29

直方图是数据库系统中最常用的估算查询代价的一种工具,它将关系中的全部元组进行归类设桶,对每个桶分别进行大小统计,继而求和,使代价统计值更接近于真实。直方图方法在所有代价估计法中是最容易使用和最容易维护的,每一种直方图在代价估计精度上各不相同,使用中也各有其特点。结合实际课题应用,给出了将关系的连接转换为多种直方图的方法,以及如何利用直方图进行代价估计。相似文献

18.

Consistent selectivity estimation via maximum entropy

V. Markl P. J. Haas M. Kutsch N. Megiddo U. Srivastava T. M. Tran 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(1):55-76

Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when comparing alternative query execution plans. To this end, advanced optimizers use multivariate statistics to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the resulting use of partial distribution information raises the possibility that multiple, non-equivalent selectivity estimates may be available for a given predicate. Current optimizers use cumbersome ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the optimizer toward query plans for which the least information is available, often yielding poor results. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method exploits all available information and avoids the bias problem. In the absence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Experiments with our prototype implementation in DB2 UDB show that use of the ME approach can improve the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times. For almost all queries, these improvements are obtained while adding only tens of milliseconds to the overall time required for query optimization. 相似文献

19.

估算查询结果大小的直方图方法之研究 总被引：11，自引：0，他引：11

吴胜利《软件学报》1998,9(4):285-289

直方图是许多商用数据库系统中最常用的一种估算查询结果大小的方法.从实用的观点来看，过去已提出的一些直方图方法有局限性，主要是它们不能保证估算值的准确程度.本文将提出两种新的直方图方法，它们不仅使用方便，而且可以保证所有的估算值均在给定的误差范围内.此外，本文还探讨了不同的数据分布对直方图的影响，通过运用一些重要的参数刻画数据分布，用以帮助生成效果较佳的直方图. 相似文献