首页 | 官方网站   微博 | 高级检索  
     

基于树型门控循环单元的基数和代价估计器
引用本文:乔少杰,杨国平,韩楠,屈露露,陈浩,毛睿,元昌安,Louis Alberto GUTIERREZ. 基于树型门控循环单元的基数和代价估计器[J]. 软件学报, 2022, 33(3): 797-813
作者姓名:乔少杰  杨国平  韩楠  屈露露  陈浩  毛睿  元昌安  Louis Alberto GUTIERREZ
作者单位:成都信息工程大学软件工程学院,四川成都610225;成都信息工程大学管理学院,四川成都610225;北京华为数字技术有限公司,北京100085;深圳大学计算机与软件学院,广东深圳518060;广西教育学院,广西南宁530023;Department of Computer Science,Rensselaer Polytechnic Institute,New York,USA
基金项目:国家自然科学基金(61772091,61802035,61962006,61962038,U1802271,U2001212,62072311);CCF-华为数据库创新研究计划(CCF-HuaweiDBIR2020004A);四川省科技计划项目(2021JDJQ0021,2020YJ0481,2020YJ0430);成都市重大科技创新项目(2021-YF08-00156-GX);成都市技术创新研发项目(2021-YF05-00491-SN);四川音乐学院数字媒体艺术四川省重点实验室资助项目(21DMAKL02);广西自然科学基金项目(2018GXNSFDA138005);广东省基础与应用基础研究基金(2020B1515120028)
摘    要:基数估计和代价估计可以引导执行计划的选择,估计准确性对查询优化器至关重要.然而,传统数据库的代价和基数估计技术无法提供准确的估计,因为现有技术没有考虑多个表之间的相关性.将人工智能技术应用于数据库(artificial intelligence for databases, AI4DB)近期得到广泛关注,研究结果表明,基于学习的估计方法优于传统方法.然而,现有基于学习的方法仍然存在不足:首先,大部分的方法只能估计基数,但忽略了代价估计;其次,这些方法只能处理一些简单的查询语句,对于多表查询、嵌套查询等复杂查询则无能为力;同时,对字符串类型的值也很难处理.为了解决上述问题,提出了一种基于树型门控循环单元, Tree-GRU (tree-gated recurrent unit)的基数和代价估计方法,可以同时对基数和代价进行估计.此外,采用了有效的特征提取和编码技术,在特征提取中兼顾查询和执行计划,将特征嵌入到Tree-GRU中.对于字符串类型的值,使用神经网络自动提取子串与整串的关系,并进行字符串嵌入,从而使具有稀疏性的字符串变得容易被估计器处理.在JOB、Synthetic等数据集上进...

关 键 词:AI4DB  基数估计  代价估计  查询优化器  Tree-GRU  执行计划
收稿时间:2021-06-30
修稿时间:2021-07-31

Cardinality and Cost Estimator Based on Tree Gated Recurrent Unit
QIAO Shao-Jie,YANG Guo-Ping,HAN Nan,QU Lu-Lu,CHEN Hao,MAO Rui,YUAN Chang-An,Louis Alberto GUTIERREZ. Cardinality and Cost Estimator Based on Tree Gated Recurrent Unit[J]. Journal of Software, 2022, 33(3): 797-813
Authors:QIAO Shao-Jie  YANG Guo-Ping  HAN Nan  QU Lu-Lu  CHEN Hao  MAO Rui  YUAN Chang-An  Louis Alberto GUTIERREZ
Affiliation:School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China;School of Management, Chengdu University of Information Technology, Chengdu 610225, China;Beijing Huawei Digital Technologies Co., Ltd, Beijing 100085, China;College of Computer Science and Software Engineering, Shenzhen 518060, China;Guangxi College of Education, Nanning 530023, China; Department of Computer Science, Rensselaer Polytechnic Institute, New York, USA
Abstract:Cardinality estimation and cost estimation can guide the selection of execution plan, and cardinality accuracy is very important for query optimizer. However, the cost and cardinality estimation techniques in traditional databases cannot provide accurate estimations because they do not consider the correlation across multiple tables. Recently, the application of artificial intelligence technology to databases (AI4DB, Artificial Intelligence for Databases) has recently attracted wide attention, and the results show that the learning-based estimation method is superior to the traditional methods. However, the existing learning based methods have some drawbacks. Firstly, most methods can only estimate the cardinality, but ignore the cost estimation. Secondly, these methods can only deal with some simple query statements, and do not work for complex queries, such as multi-table query and nested query. At the same time, they are also difficult to deal with string type of values. In order to solve the above problems, a novel method of estimating the cardinality and cost based on Tree-GRU (Tree-Gated Recurrent Unit), which can estimate the cardinality as well as the cost. In addition, an effective feature extraction and coding technology is applied, both query and execution plan are considered in feature extraction. These features are embedded into Tree-GRU. For the value of string type, the neural network is used to automatically extract the relationship between the substring and the whole string, embedding the string, so that the sparse string can be easily processed by the estimator. Extensive experiments were conducted on the JOB and Synthetic datasets and the results show that the performance of the proposed model outperforms the state-of-the-art algorithms in all aspects.
Keywords:AI4DB  cardinality estimation  cost estimation  query optimizer  Tree-GRU  execution plan
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号