首页 | 官方网站   微博 | 高级检索  
     

复合型日志的模板提取方法
引用本文:吴其,黄小红,马严,丛群.复合型日志的模板提取方法[J].浙江大学学报(自然科学版 ),2020,54(8):1557-1561.
作者姓名:吴其  黄小红  马严  丛群
作者单位:1. 北京邮电大学 网络技术研究院 信息网络中心,北京 1008762. 北京网瑞达科技有限公司,北京 100876
基金项目:中央高校基本科研专项资金资助项目(2019RC53);国家CNGI专项资助项目(CNGI-12-03-001)
摘    要:为了解决目前复合型日志无法被模板提取算法正确解析的问题,设计新的模板提取算法CLEA来处理复合型日志的模板提取. 该算法使用符号将所有日志划分为集群,基于Drain模板提取算法提取每个集群各自的日志模板,存储并缓存模板提取结果,在更新集群的同时更新缓存的模板;将差异度计算引入简单共有词算法中,增强简单共有词算法对模板中不同词语的敏感度,计算模板之间的相似度;设计BMerge算法,利用该算法对相似度大于阈值的模板进行合并,获取并输出合并日志作为最终结果. 在相似度算法中引入差异度计算,增强算法对模板中不同词语的敏感度,并设计BMerge算法对模板进行合并,输出无损日志作为结果. 所提方法适用于处理复合型日志,且正确率较高.

关 键 词:模板提取  复合型日志  简单共有词  相似度  Json  日志提取  

A template extraction method for composite log
Qi WU,Xiao-hong HUANG,Yan MA,Qun CONG.A template extraction method for composite log[J].Journal of Zhejiang University(Engineering Science),2020,54(8):1557-1561.
Authors:Qi WU  Xiao-hong HUANG  Yan MA  Qun CONG
Abstract:A new template extraction algorithm was designed to handle the template extraction of the composite log, and the algorithm was named composite-log extraction algorithm (CLEA), in order to solve the problem that currently, the composite log cannot be correctly parsed by the template extraction algorithms. Symbols are used to divide all logs into clusters, and the respective log template of each cluster is extracted based on the Drain extraction method. Template extraction results are stored and cached, and the cached template is updated together with the cluster update. The calculation of the difference is introduced into the simple common word algorithm to enhance the sensitivity of the algorithm to different words in the template and calculate the similarity between templates. The BMerge algorithm is designed and used to merge templates with similarity greater than the threshold, and the merged log is got and output as the final result. The difference calculation is introduced into the similarity algorithm, the sensitivity of the algorithm to different words in the template is enhanced, and the BMerge algorithm is designed to merge the templates, and then lossless log is output as result. The proposed method is suitable for processing composite logs with high accuracy.
Keywords:template extraction  composite log  simple common word  similarity  Json  log extraction  
本文献已被 CNKI 等数据库收录!
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号