面向非独立同分布数据的联邦梯度提升决策树 Federated gradient boosting decision tree for non-IID dataset期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向非独立同分布数据的联邦梯度提升决策树

引用本文：	赵雪,李晓会.面向非独立同分布数据的联邦梯度提升决策树[J].计算机应用研究,2023,40(7).

作者姓名：	赵雪李晓会

作者单位：	辽宁工业大学电子与信息工程学院,辽宁工业大学电子与信息工程学院

基金项目：	国家自然科学基金青年科学基金资助项目(61802161);辽宁省应用基础研究计划项目(2022JH2/101300278,2022JH2/101300279)

摘要：	随着联邦学习的不断兴起，梯度提升决策树（GBDT）作为一种传统的机器学习方法，逐渐应用于联邦学习中以达到理想的分类效果。针对现有GBDT的横向联邦学习模型，存在精度受非独立同分布数据的影响较大、信息泄露和通信成本高等问题，提出了一种面向非独立同分布数据的联邦梯度提升决策树（federated GBDT for non-IID dataset，nFL-GBDT）。首先，采用局部敏感哈希（LSH）来计算各个参与方之间的相似样本，通过加权梯度来构建第一棵树。其次，由可靠第三方计算只需要一轮通信的全局叶权重来更新树模型。最后，实验分析表明了该算法能够实现对原始数据的隐私保护，并且通信成本低于simFL和FederBoost。同时，实验按照不平衡比率来划分三组公共的数据集，结果表明该算法与Individual、TFL及F-GBDT-G相比，准确率分别提升了3.53%、5.46%和4.43%。
关键词：	联邦学习梯度提升决策树非独立同分布局部敏感哈希
收稿时间：	2022/12/5 0:00:00
修稿时间：	2023/6/11 0:00:00
Federated gradient boosting decision tree for non-IID dataset

zhaoxue and lixiaohui.Federated gradient boosting decision tree for non-IID dataset[J].Application Research of Computers,2023,40(7).

Authors:	zhaoxue and lixiaohui

Affiliation:	School of Electronics and Information Engineering, Liaoning University of Technology,,

Abstract:	With the continuous rise of federated learning, gradient boosting decision tree(GBDT), as a traditional machine learning method, is gradually applied to federated learning to achieve ideal classification results. Aiming at the problems of the existing horizontal federated learning model of GBDT, such as the accuracy is greatly influenced by non-IID dataset, information leakage and high communication cost, the paper proposed a federated gradient boosting decision tree for non-IID dataset, called, nFL-GBDT. Firstly, the algorithm calculated similar samples among the participants by utilizing the LSH, and using weighted gradient constructed the first tree. Secondly, a reliable third party calculated the global leaf weight that only needed one round of communication. Finally, the experimental analysis shows that the algorithm can protect the privacy of the original data, and its communication cost is lower than that of simFL and FederBoost. At the same time, the experiment divides three groups of public data sets according to the imbalance ratio. The results show that the accuracy of this algorithm is improved by 3.53%, 5.46% and 4.43% respectively compared with Individual, TFL and F-GBDT-G.

Keywords:	federated learning gradient boosting decision tree non-independent and identical distribution(non-IID) locality sensitive hash(LSH)

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏