基于支持向量机的中文极短文本分类模型 Classification model based on support vector machine for Chinese extremely short text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于支持向量机的中文极短文本分类模型

引用本文：	王杨.基于支持向量机的中文极短文本分类模型[J].计算机应用研究,2020,37(2):347-350.

作者姓名：	王杨

作者单位：	安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000;安徽师范大学计算机与信息学院,安徽芜湖241000

基金项目：	安徽省自然科学基金;安徽省人文社科基金资助项目;国家自然科学基金

摘要：	为了有效提取极短文本中的关键特征信息，提出了一种基于支持向量机的极短文本分类模型。首先对原数据进行数据清洗并利用jieba分词将清洗过的数据进行处理；再将处理后的数据存入数据库，通过TF-IDF进行文本特征的提取；同时，利用支持向量机对极短文本进行分类。经过（1-0）检验，验证了模型的有效性。实验以芜湖市社管平台中的9906条极短文本数据作为样本进行算法检验与分析。结果表明在分类准确率方面，该方法相比于朴素贝叶斯、逻辑回归、决策树等传统方法得到有效提高；在误分度与精确度指标上匹配结果更加均衡。
关键词：	支持向量机 jieba分词极短文本分类 TF-IDF
收稿时间：	2018/6/29 0:00:00
修稿时间：	2018/8/28 0:00:00
Classification model based on support vector machine for Chinese extremely short text

wang yang.Classification model based on support vector machine for Chinese extremely short text[J].Application Research of Computers,2020,37(2):347-350.

Authors:	wang yang

Affiliation:	AHNU M@CSCHOOL

Abstract:	In order to effectively extract the key features from the extremely short texts, this paper proposed an extremely short text classification model based on SVM. Firstly, by the data cleansing on the original data, the cleaned data was processed by the jieba segmentation and TF-IDF. Then the(1-0) test verified the validity of the model. Finally, 9906 pieces of extremely short texts in Wuhu city community management platform were used as the sample in this experiment. The results show that the proposed method can effectively improve classification accuracy compared to other traditional methods, such as naive Bayes, logistic regression and decision tree. At the same time, the matching results in terms of misclassification and accuracy are more balanced.

Keywords:	support vector machine(SVM) Jieba segmentation extremely short text TF-ID
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏