首页 | 官方网站   微博 | 高级检索  
     

无池化层卷积神经网络的中文分词方法
引用本文:涂文博,袁贞明,俞凯.无池化层卷积神经网络的中文分词方法[J].计算机工程与应用,2020,56(2):120-126.
作者姓名:涂文博  袁贞明  俞凯
作者单位:1.杭州师范大学 信息工程学院,杭州 311121 2.移动健康管理系统教育部工程研究中心,杭州 311121
摘    要:在中文信息处理中,分词是一个十分常见且关键的任务。很多中文自然语言处理的任务都需要先进行分词,再根据分割后的单词完成后续任务。近来,越来越多的中文分词采用机器学习和深度学习方法。然而,大多数模型都不同程度的有模型过于复杂、过于依赖人工处理特征、对未登录词表现欠佳等缺陷。提出一种基于卷积神经网络(Convolutional Neural Networks,CNN)的中文分词模型--PCNN(Pure CNN)模型,该模型使用基于字向量上下文窗口的方式对字进行标签分类,具有结构简单、不依赖人工处理、稳定性好、准确率高等优点。考虑到分布式字向量本身的特性,在PCNN模型中不需要卷积的池化(Pooling)操作,卷积层提取的数据特征得到保留,模型训练速度获得较大提升。实验结果表明,在公开的数据集上,模型的准确率达到当前主流神经网络模型的表现水准,同时在对比实验中也验证了无池化层(Pooling Layer)的网络模型要优于有池化层的网络模型。

关 键 词:自然语言处理  中文分词  卷积神经网络  字向量  

Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation
TU Wenbo,YUAN Zhenming,YU Kai.Convolutional Neural Networks Without Pooling Layer for Chinese Word Segmentation[J].Computer Engineering and Applications,2020,56(2):120-126.
Authors:TU Wenbo  YUAN Zhenming  YU Kai
Affiliation:1.College of Information Engineering, Hangzhou Normal University, Hangzhou 311121, China 2.Engineering Research Center of Mobile Health Management System, Ministry of Education, Hangzhou 311121, China
Abstract:In Chinese information processing,word segmentation is a very common and critical task.Usually,the first step of the Chinese Natural Language Processing(NLP)tasks is word segmentation.Over the years,the method of Chinese word segmentation has evolved from machine learning to deep learning.However,most of the models have various deficiencies such as the models being too complex,relying heavily on hand-crafted features,and having poor performance on Out of Vocabulary(OOV)words.This paper proposes a PCNN(Pure CNN)Chinese word segmentation model based on Convolutional Neural Networks(CNN).This model uses the word vector context window to label the words.It has a simple structure and does not rely on the hand-crafted features,good stability,high accuracy and other advantages.Considering the characteristics of the distributed word vector itself,there is no need for pooling in the PCNN model.The features data extracted from the convolution layer are preserved,and the training speed of the model is greatly improved.The experimental results on public datasets show that the accuracy of the model is reached other neural network models.At the same time,it is also verified in the comparison experiment that the network model without pooling layer is superior to the network model with pooling layer.
Keywords:Natural Language Processing(NLP)  Chinese word segmentation  Convolutional Neural Networks(CNN)  word vector
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号