融合多头自注意力机制的中文短文本分类模型 Chinese short text classification model with multi-head self-attention mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合多头自注意力机制的中文短文本分类模型

引用本文：	张小川,戴旭尧,刘璐,冯天硕.融合多头自注意力机制的中文短文本分类模型[J].计算机应用,2005,40(12):3485-3489.

作者姓名：	张小川戴旭尧刘璐冯天硕

作者单位：	1. 重庆理工大学两江人工智能学院, 重庆 401135;2. 重庆理工大学计算机科学与工程学院, 重庆 400054

基金项目：	国家自然科学基金资助项目（61702063）；重庆市自然科学基金资助项目（cstc2019jcyj-msxmX0544）。

摘要：	针对中文短文本缺乏上下文信息导致的语义模糊从而存在的特征稀疏问题，提出了一种融合卷积神经网络和多头自注意力机制（CNN-MHA）的文本分类模型。首先，借助现有的基于Transformer的双向编码器表示（BERT）预训练语言模型以字符级向量形式来格式化表示句子层面的短文本；然后，为降低噪声，采用多头自注意力机制（MHA）学习文本序列内部的词依赖关系并生成带有全局语义信息的隐藏层向量，再将隐藏层向量输入到卷积神经网络（CNN）中，从而生成文本分类特征向量；最后，为提升分类的优化效果，将卷积层的输出与BERT模型提取的句特征进行特征融合后输入到分类器里进行再分类。将CNN-MHA模型分别与TextCNN、BERT、TextRCNN模型进行对比，实验结果表明，改进模型在搜狐新闻数据集上的F1值表现和对比模型相比分别提高了3.99%、0.76%和2.89%，验证了改进模型的有效性。
关键词：	中文短文本文本分类多头自注意力机制卷积神经网络特征融合
收稿时间：	2020-06-19
修稿时间：	2020-08-26
Chinese short text classification model with multi-head self-attention mechanism

ZHANG Xiaochuan,DAI Xuyao,LIU Lu,FENG Tianshuo.Chinese short text classification model with multi-head self-attention mechanism[J].journal of Computer Applications,2005,40(12):3485-3489.

Authors:	ZHANG Xiaochuan DAI Xuyao LIU Lu FENG Tianshuo

Affiliation:	1. College of Liangjiang Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China;2. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Abstract:	Aiming at the problem that the semantic ambiguity caused by the lack of context information in Chinese short texts results in feature sparsity, a text classification model combing Convolutional Neural Network and Multi-Head self-Attention mechanism (CNN-MHA) was proposed. Firstly, the existing Bidirectional Encoder Representations from Transformers (BERT) pre-training language model was used to format the sentence-level short texts in the form of character-level vectors. Secondly, in order to reduce the noise, the Multi-Head self-Attention mechanism (MHA) was used to learn the word dependence inside the text sequence and generate the hidden layer vector with global semantic information. Then, the hidden layer vector was input into the Convolutional Neural Network (CNN) to generate the text classification feature vector. In order to improve the optimization effect of classification, the output of convolutional layer was fused with the sentence features extracted by BERT model, and then inputted to the classifier for re-classification. Finally, the CNN-MHA model was compared with TextCNN model, BERT model and TextRCNN model respectively. Experimental results show that, the F1 performance of the improved model is increased by 3.99%, 0.76% and 2.89% respectively compared to those of the comparison models on SogouCS dataset, which proves the effectiveness of the improved model.

Keywords:

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏