首页 | 官方网站   微博 | 高级检索  
     

基于LSTM的中文文本多分类应用研究
引用本文:梁登玉.基于LSTM的中文文本多分类应用研究[J].上海电力学院学报,2020,36(6):598-602.
作者姓名:梁登玉
作者单位:上海电力大学 计算机与科学技术学院
摘    要:随着互联网的发展,网上购物成为主流消费方式,随之产生了大量的商品文本数据,需要对商品进行准确而高效的分类。利用机器学习进行文本分类需要进行复杂的人工设计特征和提取特征过程。随着深度学习领域的发展,基于深度学习的文本分类技术效果显著。设计了一个基于长短期记忆网络(LSTM)的中文文本多分类器。首先对数据进行预处理,利用Tokenizer分词技术将文本处理为计算机可理解的词向量传入LSTM网络,并加入Dropout算法以防止过拟合得出最终的分类模型。将该模型与逻辑回归、多项式朴素贝叶斯、线性支持向量机、随机森林模型进行对比发现,基于LSTM的中文文本多分类方法具有较好的效果。

关 键 词:文本多分类  深度学习  长短期记忆网络  自然语言处理
收稿时间:2019/12/13 0:00:00

MultiClassification of Chinese Text Based on LSTM
LIANG Dengyu.MultiClassification of Chinese Text Based on LSTM[J].Journal of Shanghai University of Electric Power,2020,36(6):598-602.
Authors:LIANG Dengyu
Affiliation:School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
Abstract:With the development of the Internet,online shopping has become the mainstream,and a large amount of commodity text data has been produced.Accurate and efficient classification of commodities has become a demand.Text classification by machine learning requires a complex process of designing and extracting features manually.With the development of deep learning,text classification technology based on deep learning has achieved remarkable results.Therefore,this paper designs a multi-classifier of Chinese text based on Long Short-Term Memory.First of all,the data is preprocessed,the text is processed into the word vector that can be understood by the computer and sent to the LSTM network by the Tokenizer word segmentation technology,and the Dropout algorithm is added to prevent overfitting to get the final classification model,and the model is compared with logistic regression,naive bayes,support vector machine and random forest.The experimental results show that the LSTM-based multi-classification method has achieved good results.
Keywords:multi-classification  deep learning  long short term memory  natural language processing
点击此处可从《上海电力学院学报》浏览原始摘要信息
点击此处可从《上海电力学院学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号