首页 | 官方网站   微博 | 高级检索  
     

基于二阶近邻的异常检测
引用本文:卢梦茹,周昌军,刘华文,徐晓丹.基于二阶近邻的异常检测[J].计算机系统应用,2023,32(2):160-169.
作者姓名:卢梦茹  周昌军  刘华文  徐晓丹
作者单位:浙江师范大学 数学与计算机科学学院, 金华 321004
基金项目:国家自然科学基金 (61976195)
摘    要:对盈千累万且错综复杂的数据集进行分析,是一个非常具有挑战性的任务,检测数据中的异常值的技术在该任务中发挥着举足轻重的作用.通过聚类捕获异常的方式,在日趋流行的异常检测技术中是最为常用的一类方法.文中提出了一种基于二阶近邻的异常检测算法(anomaly detection based second-order proximity, SOPD),主要包括聚类和异常检测两个阶段.在聚类过程中,通过二阶近邻的方式获取相似性矩阵;在异常检测过程中,根据簇中的点与簇中心的关系,计算聚类生成的每一个簇中的所有的点与该簇中心的距离,捕捉异常状态,并把每个数据点的密度考虑进去,排除簇边界情况.二阶近邻的使用,使得数据的局部性以及全局性得以被同时考虑,进而使得聚类得到的簇数减少,增加了异常检测的精确性.通过大量实验,将该算法与一些经典的异常检测算法进行比较,结果表明, SOPD算法整体上性能较好.

关 键 词:异常检测  二阶近邻  相似性矩阵  密度  全局性  机器学习  数据挖掘
收稿时间:2022/7/14 0:00:00
修稿时间:2022/9/7 0:00:00

Anomaly Detection Based on Second-order Proximity
LU Meng-Ru,ZHOU Chang-Jun,LIU Hua-Wen,XU Xiao-Dan.Anomaly Detection Based on Second-order Proximity[J].Computer Systems& Applications,2023,32(2):160-169.
Authors:LU Meng-Ru  ZHOU Chang-Jun  LIU Hua-Wen  XU Xiao-Dan
Affiliation:College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China
Abstract:The analysis of numerous and intricate data sets is a highly challenging task, in which the technique to detect outliers in data plays a pivotal role. Capturing anomalies by clustering is the most common method among the increasingly popular anomaly detection techniques. This study proposes an anomaly detection algorithm based on second-order proximity (SOPD), which includes clustering and anomaly detection stages. During clustering, the similarity matrix is obtained by second-order proximity. During anomaly detection, the relationships between points in the cluster and the center of the cluster are employed to calculate the distance of all the points in each cluster generated by clustering from the center of the cluster and capture the anomalous state. The density of each data point is also taken into account to exclude the cases of cluster boundaries. The use of second-order proximity enables the locality and globality of the data to be considered simultaneously, which reduces the number of the obtained clusters and increases the accuracy of anomaly detection. Moreover, this study compares this algorithm with some classical anomaly detection algorithms through massive experiments, and the result shows that the SOPD-based algorithm performs well overall.
Keywords:anomaly detection  second-order proximity  similarity matrix  density  globality  machine learning  data mining
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号