首页 | 官方网站   微博 | 高级检索  
     

基于大规模网络日志的模板提取研究
引用本文:崔元,张琢.基于大规模网络日志的模板提取研究[J].计算机科学,2017,44(Z11):448-452.
作者姓名:崔元  张琢
作者单位:东北师范大学信息与软件工程学院 长春130117,东北师范大学信息与软件工程学院 长春130117;教育部数字化学习支撑技术工程研究中心 长春130117
摘    要:针对直接从大型网络日志中提取网络事件困难的问题,提出了基于大规模网络日志的模板提取方法。该方法可将海量的、原始的网络日志主动转换为日志模板,从而为了解网络事件的根因和预防网络故障的发生提供重要的前期准备。首先分析日志的结构,将日志中的词划分为模板词和参数词两类;然后从3个不同的角度切入,分别对日志进行模板提取研究;最后使用互联网公司中的实际生产数据,采用Rand_index方法来评估3种提取方法的准确有效性。结果表明,在从服务集群中收集来的4种不同消息类型中,基于标签识别树模型提取到的日志模板的平均准确率达到99.57%,高于基于统计模板提取模型和基于在线提取模板模型的准确率。

关 键 词:切词  提取模板  统计聚类  标签识别树  在线聚类

Research on Template Extraction Based on Large-scale Network Log
CUI Yuan and ZHANG Zhuo.Research on Template Extraction Based on Large-scale Network Log[J].Computer Science,2017,44(Z11):448-452.
Authors:CUI Yuan and ZHANG Zhuo
Affiliation:School of Information and Software Engineering,Northeast Normal University,Changchun 130117,China and School of Information and Software Engineering,Northeast Normal University,Changchun 130117,China;Digital Learning Support Technology Engineering Research Center,Ministry of Education,Changchun 130117,China
Abstract:Aiming at the problem of extracting network events directly from large-scale network log,a template extraction method based on large-scale network log was proposed.The method can automatically convert the massive and original network logs into log templates,so as to provide important pre-preparation for understanding the network events root causes and preventing the occurrence of network failure.Firstly,the structure of the log is analyzed,and the words in the log are divided into two types:template word and parameter word.Then,from three different angles,the log template extraction is studied respectively.Finally,the actual production data of the Internet company is used,and Rand_index method is used to evaluate the accuracy and validity of the three extraction methods.The results show that the average accuracy of the log templates based on the tag recognition tree model is 99.57%,which is higher than that of the four different types of messages collected from the service cluster.
Keywords:Cut words  Extract template  Statistical clustering  Signature tree  Online clustering
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号