首页 | 官方网站   微博 | 高级检索  
     

基于MapReduce的并行异常检测算法
引用本文:齐小刚,胡秋秋,刘立芳.基于MapReduce的并行异常检测算法[J].智能系统学报,2019,14(2):224-230.
作者姓名:齐小刚  胡秋秋  刘立芳
作者单位:1. 西安电子科技大学 数学与统计学院, 陕西 西安 710071;2. 西安电子科技大学 计算机学院, 陕西 西安 710071
摘    要:为了提高数据挖掘中异常检测算法在数据量增大时的准确度、灵敏度和执行效率,本文提出了一种基于MapReduce框架和Local Outlier Factor (LOF)算法的并行异常检测算法(MR-DLOF)。首先,将存放在Hadoop分布式文件系统(HDFS)上的数据集逻辑地切分为多个数据块。然后,利用MapReduce原理将各个数据块中的数据并行处理,使得每个数据点的k-邻近距离和LOF值的计算仅在单个块中执行,从而提高了算法的执行效率;同时重新定义了k-邻近距离的概念,避免了数据集中存在大于或等于k个重复点而导致局部密度为无穷大的情况。最后,将LOF值较大的数据点合并重新计算其LOF值,从而提高算法准确度和灵敏度。通过真实数据集验证了MR-DLOF算法的有效性、高效性和可扩展性。

关 键 词:数据挖掘  异常检测  局部离群因子  Hadoop  MapReduce  分布式文件系统  并行计算  局部密度

Parallel anomaly algorithm based on MapReduce
QI Xiaogang,HU Qiuqiu,LIU Lifang.Parallel anomaly algorithm based on MapReduce[J].CAAL Transactions on Intelligent Systems,2019,14(2):224-230.
Authors:QI Xiaogang  HU Qiuqiu  LIU Lifang
Affiliation:1. School of Mathematics and Statistics, Xidian University, Xi’an 710071, China;2. School of Computer Science and Technology, Xidian University, Xi’an 710071, China
Abstract:To improve the accuracy, sensitivity, and efficiency of anomaly detection algorithm in data mining when the amount of data increases, a parallel anomaly detection algorithm (MR-LOF) based on the MapReduce framework and the local outlier factor (LOF) algorithm is proposed in this paper. First, the dataset, stored in the Hadoop distributed file system (HDFS), is logically divided into multiple data blocks. Then, the MapReduce principle is used to process the data in each data block in parallel, so that the k-distance and LOF value of each data point is calculated only in a single block. It greatly improves the efficiency of the algorithm. Simultaneously, the concept of k-distance is redefined. It avoids the situation where the local density is infinite because more than k repeated points exist in the dataset. Finally, the data points whose LOF value is larger than threshold are merged, and the LOF values of combined data are recalculated. This process can effectively improve the accuracy and sensitivity. Experiments with real-world datasets demonstrate the validity, high efficiency, and extendibility of the MR-DLOF algorithm.
Keywords:data mining  anomaly detection  local outlier factor  Hadoop  MapReduce  Distributed File System  parallel computing  local density
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号