首页 | 官方网站   微博 | 高级检索  
     

一种基于汉字笔顺特征的关键词变体匹配方法
引用本文:王红雨,杜刚,朱艳云,张晨,杜雪涛.一种基于汉字笔顺特征的关键词变体匹配方法[J].电信工程技术与标准化,2020(12).
作者姓名:王红雨  杜刚  朱艳云  张晨  杜雪涛
作者单位:中国移动通信集团设计院有限公司,中国移动通信集团设计院有限公司,中国移动通信集团设计院有限公司,中国移动通信集团设计院有限公司,中国移动通信集团设计院有限公司
摘    要:近年来,垃圾短消息呈现出包含大量拆分字、形近字现象,这种短消息可以绕过监控系统的关键词审查。由于拆分字、形近字数量众多、变化灵活,将其加入关键词库将使得关键词库冗余。对此,本文提出了一种基于汉字笔顺特征的关键词变体匹配方法,基于汉字笔顺特征,首先合并垃圾短消息中的拆分字,然后通过建立索引查找短消息中包含的疑似关键词,最后提出了“金字塔匹配法”匹配关键词。本文提出的方法有效降低了关键词库的冗余度,提高了关键词匹配效率。

关 键 词:关键词变体匹配  合并拆分字  金字塔匹配法
收稿时间:2020/11/5 0:00:00
修稿时间:2020/11/11 0:00:00

A Variant Keyword Matching Method Based on The Stroke Order Features of Chinese Characters
wanghongyu,dugang,zhuyanyun,zhangchen and duxuetao.A Variant Keyword Matching Method Based on The Stroke Order Features of Chinese Characters[J].Telecom Engineering Technics and Standardization,2020(12).
Authors:wanghongyu  dugang  zhuyanyun  zhangchen and duxuetao
Affiliation:China Mobile Group Design Institute Co., Ltd.,China Mobile Group Design Institute Co., Ltd.,China Mobile Group Design Institute Co., Ltd.,China Mobile Group Design Institute Co., Ltd.,China Mobile Group Design Institute Co., Ltd.
Abstract:In recent years, spam short messages appear to contain a large number of split and similar characters, this kind of short message can bypass keyword filtering and be sent to users. Due to the large number and flexible changes of split words and similar words, adding them to the key database will make the database redundant. In this paper, a variant keyword matching method based on the stroke order features of Chinese characters is proposed. Firstly, the split words in spam short messages are merged based on the stroke order features of Chinese characters. Secondly, the suspected keywords contained in spam messages are indexed by an index table which is built using the characters of keywords. Finally, a pyramid matching method is proposed to match keywords. The method proposed in this paper can effectively reduce the redundancy of keywords database and improve the efficiency of keywords matching.
Keywords:Variant keywords matching  Merging Chinese characters  A method of pyramid matching
点击此处可从《电信工程技术与标准化》浏览原始摘要信息
点击此处可从《电信工程技术与标准化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号