首页 | 官方网站   微博 | 高级检索  
     

基于SVM的文本多选择分类系统的设计与实现
引用本文:丁世涛,卢军,洪鸿辉,黄傲,郭致远.基于SVM的文本多选择分类系统的设计与实现[J].计算机与数字工程,2020,48(1):147-152.
作者姓名:丁世涛  卢军  洪鸿辉  黄傲  郭致远
作者单位:武汉邮电科学研究院 武汉 430074;武汉邮电科学研究院 武汉 430074;武汉邮电科学研究院 武汉 430074;武汉邮电科学研究院 武汉 430074;武汉邮电科学研究院 武汉 430074
摘    要:随着互联网的普及,人类获取特定信息需求的增加,如何快速获取特定类别信息是当前搜索引擎,门户网站等必须解决的问题。当前网页分类的任务都由机器学习的文本分类算法完成,但传统的机器学习分类方法基本没有考虑文本数据特征,提供无差别的分类服务。该系统充分考虑网页文本数据的特征,以文本标题为突破口实现快速分类以及依据SVM的普通分类。快速分类依据文本标题通过分词模型训练快速对应到分类标签上,完成快速分类。如果快速分类不成功则将文本内容通过结巴分词器分词,word2vec进行分词向量的训练,再根据分类要求通过SVM进行分类,完成普通的分类。通过提供两种不同的服务来完成不同的需求。

关 键 词:机器学习  标题  快速分类  word2vec  SVM

Design and Implementation of Chinese Web Page Multiple Choice Classification System Based on Support Vector Machine
DING Shitao,LU Jun,HONG Honghui,HUANG Ao,GUO Zhiyuan.Design and Implementation of Chinese Web Page Multiple Choice Classification System Based on Support Vector Machine[J].Computer and Digital Engineering,2020,48(1):147-152.
Authors:DING Shitao  LU Jun  HONG Honghui  HUANG Ao  GUO Zhiyuan
Affiliation:(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430074)
Abstract:With the popularization of the Internet,the demand for specific information has increased,it is necessary to quickly obtain certain categories of information which must be solved by the current search engine,p ortal website and so on.Now,m any tasks of categorizing web pages are done by the text categorization algorithm of machine learning.However,the traditional categori zation method of machine learning does not take into account the characteristics of text data and provides the different categorization service.This system takes into account the features of web text data,and realizes the purpose of fast decision and general SVM cate gorization.Fast categorization based on text title by word segmentation model get fast training categorization label to complete catego rization.If the fast categorization is not successful,it can get Chinese text segmentation by jieba,get trained word segmentation vec tor by word2vec,and get categorization by SVM to meet the requirements of providing differentiated services.
Keywords:machine learning  title  fast decision  word2vec  SVM
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号