首页 | 官方网站   微博 | 高级检索  
     

基于相关子空间的多源离群检测算法
引用本文:马洋,赵旭俊.基于相关子空间的多源离群检测算法[J].计算机工程与应用,2021,57(17):88-95.
作者姓名:马洋  赵旭俊
作者单位:太原科技大学 计算机科学与技术学院,太原 030024
摘    要:传统的离群检测方法多数源于单个数据集或多数据源融合后的单一数据集,其检测结果忽略了多源数据之间的关联知识和单数据源中的关键信息。为了检测多源数据之间的离群关联知识,提出一种基于相关子空间的多源离群检测算法RSMOD。结合k]近邻集和反向近邻集的双向影响,给出面向多源数据的对象影响空间,提高了离群对象度量的准确性;在影响空间基础上,提出面向多源数据的稀疏因子及稀疏差异因子,有效地刻画了数据对象在多源数据中的稀疏程度,重新定义了相关子空间的度量,使其能适用于多源数据集,并给出基于相关子空间的离群检测算法;采用人工合成数据集和真实的美国人口普查数据集,实验验证了RSMOD算法的性能并分析了源于多数据集的离群关联知识。

关 键 词:离群检测  多源数据  子空间  数据挖掘  稀疏因子  

Multi-source Outlier Detection Algorithm Based on Relevant Subspace
MA Yang,ZHAO Xujun.Multi-source Outlier Detection Algorithm Based on Relevant Subspace[J].Computer Engineering and Applications,2021,57(17):88-95.
Authors:MA Yang  ZHAO Xujun
Affiliation:School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
Abstract:Most of the traditional outlier detection methods come from a dataset or a single dataset after multi-source fusion. The detection results ignore the association knowledge among multi-source data sets and some key information in a single data source. To detect the related outlier knowledge among multi-source datasets, this paper proposes a Multi-source Outlier Detection algorithm based on Relevant Subspace(RSMOD). Firstly, this research proposes an object influence space for multi-source data, which uses k]-nearest-neighbor-set and reverse-nearest-neighbor-set to improve the accuracy of object deviation measurement. Secondly, this paper presents a sparse factor and a sparse difference factor for multi-source data, which can effectively describe the density of data objects in multi-source dataset. Thirdly, after redefining the measurement of relevant subspace, an outlier detection algorithm based on relevant subspace is given. The algorithm can be applied to multi-source datasets. Finally, the performance of RSMOD algorithm is verified by using synthetic datasets and real US census datasets. This paper also analyzes the above experimental results to obtain the outlier association knowledge from multiple datasets.
Keywords:outlier detection  multi-source data  subspace  data mining  sparse factor  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号