首页 | 官方网站   微博 | 高级检索  
     

基于差分隐私下包外估计的随机森林算法
引用本文:李玉强,陈鋆昊,李琦,刘爱华.基于差分隐私下包外估计的随机森林算法[J].哈尔滨工业大学学报,2021,53(2):146-154.
作者姓名:李玉强  陈鋆昊  李琦  刘爱华
作者单位:武汉理工大学 计算机科学与技术学院,武汉 430063;武汉理工大学 能源与动力工程学院,武汉 430063
摘    要:针对差分隐私随机森林算法在对高维数据进行分类时准确率不理想的问题,本文通过引入差分隐私下的包外估计来计算决策树权重以及特征权重,从而提出一种基于差分隐私下包外估计的随机森林算法(random forest under differential privacy based on the out-of-bag estimate, RFDP_OOB).本算法首先在差分隐私保护下生成一部分的随机森林,利用差分隐私下包外估计的特性对决策树和特征的重要性进行评估,从而计算出决策树权重以及特征权重,然后通过特征权重对特征进行划分,得到非重要特征集.接着在生成剩下的一部分随机森林的过程中,对最佳特征为非重要特征的结点进行预剪枝操作,使其成为叶子结点,从而减小噪声、提高决策树分类准确率,并具有较好的执行效率.最后在预测分类结果时,取所对应的决策树权重最大的分类结果作为随机森林算法的分类结果,从而提高随机森林算法的分类准确率.本文还对算法的有效性和隐私性进行了理论分析,并通过实验结果验证了本算法的有效性,本算法可以在保护数据隐私性的同时提高算法的分类准确率.

关 键 词:差分隐私  随机森林  包外估计  高维数据  数据挖掘
收稿时间:2019/12/26 0:00:00

Random forest algorithm under differential privacy based on out-of-bag estimate
LI Yuqiang,CHEN Junhao,LI Qi,LIU Aihua.Random forest algorithm under differential privacy based on out-of-bag estimate[J].Journal of Harbin Institute of Technology,2021,53(2):146-154.
Authors:LI Yuqiang  CHEN Junhao  LI Qi  LIU Aihua
Affiliation:School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; School of Energy and Power Engineering, Wuhan University of Technology, Wuhan 430063, China
Abstract:Since the accuracy of random forest algorithm under differential privacy is undesirable when classifying high-dimensional data, the out-of-bag estimate was introduced to calculate the weights of decision trees and features, and the random forest algorithm under differential privacy based on the out-of-bag estimate (RFDP_OOB) was proposed. First, the algorithm generates a part of random forest under differential privacy, and the importance of decision trees and features is evaluated by utilizing the out-of-bag estimate under differential privacy, so as to calculate the weights of the decision trees and features. Then, the features are re-divided into non-essential features through feature weights. Next, in the process of generating the remaining part of the random forest, the pre-pruning operation is performed on the nodes whose best features are non-important features to make them leaf nodes, so as to reduce noise and improve the classification accuracy of the decision tree with better efficiency. Finally, in predicting the classification results, the classification result with the maximum weight of the corresponding decision tree is taken as the classification result of the random forest algorithm, thereby improving the classification accuracy of the random forest algorithm. The privacy and effectiveness of the algorithm were analyzed theoretically, and the experimental results verified the effectiveness of the algorithm. The proposed algorithm can improve the classification accuracy and protect the privacy of data.
Keywords:differential privacy  random forest  out-of-bag estimate  high-dimensional data  data mining
点击此处可从《哈尔滨工业大学学报》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号