首页 | 官方网站   微博 | 高级检索  
     

多节点系统异常日志流量模式检测方法
引用本文:王晓东,赵一宁,肖海力,迟学斌,王小宁.多节点系统异常日志流量模式检测方法[J].软件学报,2020,31(10):3295-3308.
作者姓名:王晓东  赵一宁  肖海力  迟学斌  王小宁
作者单位:中国科学院计算机网络信息中心,北京100190;中国科学院大学,北京 100049;中国科学院计算机网络信息中心,北京100190
基金项目:国家重点研发计划(2018YFB0204002);国家自然科学基金(61702477)
摘    要:随着国家高性能计算环境各个节点产生日志数量的不断增加,采用传统的人工方式进行异常日志分析已不能满足日常的分析需求.提出一种异常日志流量模式的定义方法:同一节点相同时间片内日志类型的有序排列代表了一种日志流量模式,并以该方法为出发点,实现了一个异常日志流量模式检测方法,用来自动挖掘异常日志流量模式.该方法从系统日志入手,根据日志内容的文本相似度进行自动分类.然后将相同时间片内日志各个类型出现的次数作为输入特征,基于主成分分析的异常检测方法对该输入进行异常检测,得到大量异常的日志类型序列.之后,使用基于最长公共子序列的距离度量对这些序列进行层次聚类,并将聚类结果进行自适应K项集算法,以得出不同异常日志流量模式的序列代表.将国家高性能计算环境半年产生的日志根据不同时间段(早、晚、夜)使用上述方法进行分析,得出了不同时间段的异常日志流量模式和相互关系.该方法也可以推广到其他分布式系统的系统日志中.

关 键 词:异常日志流量  主成分分析  层次聚类  最长公共子序列  自适应K项集算法
收稿时间:2018/6/8 0:00:00
修稿时间:2018/9/10 0:00:00

Multi-node System Abnormal Log Flow Mode Detection Method
WANG Xiao-Dong,ZHAO Yi-Ning,XIAO Hai-Li,CHI Xue-Bin,WANG Xiao-Ning.Multi-node System Abnormal Log Flow Mode Detection Method[J].Journal of Software,2020,31(10):3295-3308.
Authors:WANG Xiao-Dong  ZHAO Yi-Ning  XIAO Hai-Li  CHI Xue-Bin  WANG Xiao-Ning
Affiliation:Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:With the increasing number of logs produced by nodes in CNGrid, traditional manual methods for abnormal log analysis can no longer meet the need of daily analysis. This study proposed a method to define the abnormal log traffic pattern: The orderly arrangement of log types in the same node and at the same time slice represents a log traffic pattern. Based on this method, a log traffic pattern detection method was implemented, which was applied in automatically mine of abnormal log traffic pattern. The method starts with system log and classifies automatically according to the text similarity of log content. Then, the frequency of each types of log in the same time slice is taken as the input feature, and the anomaly detection method based on principal component analysis (PCA) is used to detect the abnormal input, and a large number of abnormal log type sequences are obtained. A distance metric based on the longest common subsequence is used to cluster these sequences by hierarchical clustering method. The clustering results are used with the adaptive K-itemset algorithm to get the deputies of the abnormal log flow modes. The above method was used to analyze the logs generated in the national high performance computing environment CNGrid in half a year according to different time periods (morning, night, midnight), and has obtained the abnormal log traffic patterns and their relationships in different time periods. The method can also be extended to the system logs of other distributed systems.
Keywords:abnormal log flow  principal component analysis  hierarchical clustering  longest common subsequence  adaptive K-itemset algorithm
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号