首页 | 官方网站   微博 | 高级检索  
     

结合模糊聚类的多示例集成算法
引用本文:韩海韵,杨有龙,孙丽芹.结合模糊聚类的多示例集成算法[J].计算机工程与应用,2022,58(7):87-96.
作者姓名:韩海韵  杨有龙  孙丽芹
作者单位:西安电子科技大学 数学与统计学院,西安 710126
基金项目:陕西省自然科学基础研究计划资助项目
摘    要:针对许多多示例算法都对正包中的示例情况做出假设的问题,提出了结合模糊聚类的多示例集成算法(ISFC)。结合模糊聚类和多示例学习中负包的特点,提出了“正得分”的概念,用于衡量示例标签为正的可能性,降低了多示例学习中示例标签的歧义性;考虑到多示例学习中将负示例分类错误的代价更大,设计了一种包的代表示例选择策略,选出的代表示例作为基分类器的训练子集;结合各基分类器的结果,确定包的最终标签。ISFC算法对正包中正示例的比例未做任何假设,同时能够解决正包数量多、负包数量少情况下的类别不平衡问题。实验结果表明,ISFC在药物分子活性预测、图像分类、文本分类任务上都取得了较好的分类效果。

关 键 词:多示例学习  模糊聚类  随机子空间  示例选择  集成学习  

Multi-Instance Ensemble Algorithm Combined with Fuzzy Clustering
HAN Haiyun,YANG Youlong,SUN Liqin.Multi-Instance Ensemble Algorithm Combined with Fuzzy Clustering[J].Computer Engineering and Applications,2022,58(7):87-96.
Authors:HAN Haiyun  YANG Youlong  SUN Liqin
Affiliation:School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
Abstract:To solve the problem that many algorithms make assumptions about the proportion of positive instances in the positive bags, a multi-instance ensemble algorithm combined with fuzzy clustering is proposed. Firstly, combining the fuzzy clustering and the characteristics of negative bags in multi-instance learning, the concept of positive score is proposed to measure the possibility of instance’s label being positive, which can reduce the ambiguity of instance’s label in multi-instance learning. Then, considering that it is more costly to classify negative instances incorrectly in multi-instance learning, an instance selection strategy of bag representative is designed, and the selected representative instances are used as the training subsets of the base classifiers. Finally, the results of each base classifier are combined to determine the final label of the bag. The ISFC algorithm does not make any assumption about the proportion of positive instances in positive bags, and can solve the class imbalanced problem when the number of positive bags is large and the number of negative bags is small. Experimental results show that ISFC has achieved good classification effect in drug molecular activity prediction, image classification, and text classification tasks.
Keywords:multi-instance learning  fuzzy clustering  random subspace  instance selection  ensemble learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号