首页 | 官方网站   微博 | 高级检索  
     

基于偏相关性测试的递归式因果推断算法
引用本文:陈铭杰,张浩,彭昱忠,谢峰,庞悦.基于偏相关性测试的递归式因果推断算法[J].计算机工程,2022,48(10):123-129.
作者姓名:陈铭杰  张浩  彭昱忠  谢峰  庞悦
作者单位:1. 东莞理工学院 计算机科学与技术学院, 广东 东莞 523808;2. 广东石油化工学院 计算机学院, 广东 茂名 525099;3. 复旦大学 计算机科学技术学院, 上海 200433;4. 南宁师范大学 计算机与信息工程学院, 南宁 530001;5. 北京大学 数学科学学院, 北京 100871;6. 中国银联博士后科研工作站, 上海 201201
基金项目:国家自然科学基金(62006051);中国博士后科学基金(2020M680225);广东省高校青年创新人才项目(2020KQNCX049)。
摘    要:因果推断是挖掘事物间联系的一种重要方式,但在高维数据场景下,利用因果推断算法进行条件独立性(CI)测试存在冗余测试多和测试效率低的问题,这限制了因果推断在高维数据集上的应用。提出一种基于偏相关性测试的递归式因果推断算法。采用“分治”的方法对变量集进行递归式因果分割,得到更易于处理的低维子数据集,提高对数据集的处理效率。在每个子数据集上进行局部因果推断,减少每次因果推断的计算量并提升算法的运行速度。在此基础上,通过比较显著性值的合并策略整合所有子结果并得到完整的因果关系,保证总体因果结构的准确性。在“分治”过程中,采用高效的偏相关性测试避免高复杂度的核密度估算,进一步提升算法效率。基于10个经典数据集的实验结果表明,在准确率与经典推断算法CAPA持平的情况下,该算法的运算速度提升了2~10倍,且在样本量越大的数据集中提升效果越明显,证明递归式因果推断算法可以有效处理高维数据集,在保证准确率的同时提高运算效率。

关 键 词:因果推断  因果网络  条件独立性测试  偏相关性测试  递归式算法  
收稿时间:2021-08-28
修稿时间:2021-10-29

Recursive Causal Inference Algorithm Based on Partial Correlation Test
CHEN Mingjie,ZHANG Hao,PENG Yuzhong,XIE Feng,PANG Yue.Recursive Causal Inference Algorithm Based on Partial Correlation Test[J].Computer Engineering,2022,48(10):123-129.
Authors:CHEN Mingjie  ZHANG Hao  PENG Yuzhong  XIE Feng  PANG Yue
Abstract:Causal inference is an important tool for mining relationships between observed data points.The causal inference algorithm encounters the problems of redundant tests and low test efficiency in high-dimensional cases, which limits the application of causal inference in high-dimensional datasets.This study proposes a recursive causal inference algorithm based on partial correlation test.The strategy of ‘divide and conquer’ is used to perform the recursive causal segmentation of the variable set to obtain the low-dimensional sub-dataset, which is easier to handle and improves the processing efficiency of the dataset.Local causal inference is performed on each subset to reduce the computation amount for each causal inference and improve the running speed of the algorithm.Thereafter, the significant values of the merger strategy are compared to integrate all subresults and obtain a complete causal relationship to ensure the accuracy of the overall causal structure.By ‘dividing and conquering’, an efficient partial correlation test is used to avoid the high complexity of kernel density estimation and further improve the efficiency of the algorithm.Experiments are performed on ten classical data sets.The results show that when the accuracy is the same as that of the classical inference algorithm, CAPA, the operation speed of this algorithm improved by two to ten times.The improvement effect is more obvious on the dataset with a larger sample size, which proves that the recursive causal inference algorithm can effectively handle high-dimensional datasets, ensure a good accuracy, and improve the operational efficiency.
Keywords:causal inference  causal network  Conditional Independence(CI) test  partial correlation test  recursive algorithm  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号