首页 | 官方网站   微博 | 高级检索  
     

基于图上随机游走的离群点检测算法
引用本文:杜旭升,于炯,叶乐乐,陈嘉颖.基于图上随机游走的离群点检测算法[J].计算机应用,2020,40(5):1322-1328.
作者姓名:杜旭升  于炯  叶乐乐  陈嘉颖
作者单位:1.新疆大学 软件学院,乌鲁木齐 830008 2.新疆大学 信息科学与工程学院,乌鲁木齐 830046 3.西安交通大学 软件学院, 西安 710049
基金项目:国家自然科学基金资助项目(61862060,61462079,61562086,61562078)。
摘    要:离群点检测算法在网络入侵检测、医疗辅助诊断等领域具有十分广泛的应用。针对LDOF、CBOF及LOF算法在大规模数据集和高维数据集的检测过程中存在的执行时间长及检测率较低的问题,提出了基于图上随机游走(BGRW)的离群点检测算法。首先初始化迭代次数、阻尼因子以及数据集中每个对象的离群值;其次根据对象之间的欧氏距离推导出漫步者在各对象之间的转移概率;然后通过迭代计算得到数据集中每个对象的离群值;最后将数据集中离群值最高的对象判定为离群点并输出。在UCI真实数据集与复杂分布的合成数据集上进行实验,将BGRW算法与LDOF、CBOF和LOF算法在执行时间、检测率和误报率指标上进行对比。实验结果表明,BGRW算法能够有效降低执行时间并在检测率及误报率指标上优于对比算法。

关 键 词:数据挖掘  离群点检测  马尔可夫链  随机游走  LDOF  CBOF  LOF
收稿时间:2019-10-10
修稿时间:2019-12-12

Outlier detection algorithm based on graph random walk
DU Xusheng,YU Jiong,YE Lele,CHEN Jiaying.Outlier detection algorithm based on graph random walk[J].journal of Computer Applications,2020,40(5):1322-1328.
Authors:DU Xusheng  YU Jiong  YE Lele  CHEN Jiaying
Affiliation:1.School of Software, Xinjiang University, UrumqiXinjiang 830008, China
2.College of Information Science and Engineering, Xinjiang University, UrumqiXinjiang 830046, China
3.School of Software Engineering, Xi’an Jiaotong University, Xi’an Shannxi 710049, China
Abstract:Outlier detection algorithms are widely used in various fields such as network intrusion detection, and medical aided diagnosis. Local Distance-Based Outlier Factor (LDOF), Cohesiveness-Based Outlier Factor (CBOF) and Local Outlier Factor (LOF) algorithms are classic algorithms for outlier detection with long execution time and low detection rate on large-scale datasets and high dimensional datasets. Aiming at these problems, an outlier detection algorithm Based on Graph Random Walk (BGRW) was proposed. Firstly, the iterations, damping factor and outlier degree for every object in the dataset were initialized. Then, the transition probability of the rambler between objects was deduced based on the Euclidean distance between the objects. And the outlier degree of every object in the dataset was calculated by iteration. Finally, the objects with highest outlier degree were output as outliers. On UCI (University of California, Irvine) real datasets and synthetic datasets with complex distribution, comparison between BGRW and LDOF, CBOF, LOF algorithms about detection rate, execution time and false positive rate were carried out. The experimental results show that BGRW is able to decrease execution time and false positive rate, and has higher detection rate.
Keywords:data mining  outlier detection  Markov chain  random walk  Local Distance-based Outlier Factor (LDOF)  Cohesiveness-Based Outlier Factor (CBOF)  Local Outlier Factor (LOF)  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号