首页 | 官方网站   微博 | 高级检索  
     

基于模糊机制和语义密度聚类的汉语自动语义角色标注研究
引用本文:王旭阳,朱鹏飞. 基于模糊机制和语义密度聚类的汉语自动语义角色标注研究[J]. 计算机应用与软件, 2019, 36(9): 76-82,92
作者姓名:王旭阳  朱鹏飞
作者单位:兰州理工大学计算机与通信学院 甘肃兰州730050;兰州理工大学计算机与通信学院 甘肃兰州730050
摘    要:基于CPB (Chinese Proposition Bank)提出一种基于LSTM-Bi-LSTM的汉语自动语义角色标注方法,并提出语义密度聚类进行数据预处理以及"模糊"机制利用于词向量转换过程。语义密度聚类通过密度的概念对谓词进行全局统一的聚类,将稀疏谓词替换为其所属聚类集合中的常见谓词;利用语义距离概念,将"模糊"机制引入词向量的转换过程,能适当地减少词向量的语义性,并提升与谓词词向量的相关性。利用Bi-LSTM网络自动学习特征表达,然后利用CRF和IOBES标注策略转化为词序列标注问题,引进一种词性学习方法;利用LSTM网络学习生成的词性特征向量与"模糊化"后的词向量融合后一同作为模型的输入向量;训练过程中采用了小批量梯度下降算法和Dropout正则化,这既加快了训练速度,又易于得到全局最优解,还防止了参数过拟合情况的出现。多组对比实验表明,该方法标注结果的F值最高达到了81.24%。

关 键 词:SRL  模糊机制  语义密度聚类  神经网络  词向量  CRF  DROPOUT

CHINESE AUTOMATIC SEMANTIC ROLE LABELING BASED ON FUZZY MECHANISM AND SEMANTIC DENSITY CLUSTERING
Wang Xuyang,Zhu Pengfei. CHINESE AUTOMATIC SEMANTIC ROLE LABELING BASED ON FUZZY MECHANISM AND SEMANTIC DENSITY CLUSTERING[J]. Computer Applications and Software, 2019, 36(9): 76-82,92
Authors:Wang Xuyang  Zhu Pengfei
Affiliation:(School of Computer and Communication, Lanzhou University of Technology,Lanzhou 730050,Gansu,China)
Abstract:On the basis of Chinese Proposition Bank (CPB), this paper proposed a Chinese automatic semantic role labeling method based on LSTM-Bi-LSTM. And the semantic density clustering was proposed for data preprocessing, and the fuzzy mechanism was applied to the word vector transformation process. Semantic density clustering used the concept of density to cluster the predicates globally, and then replaced the sparse predicates with the common predicates in the clustering set to which they belonged. By using the concept of semantic distance, the fuzzy mechanism was introduced into the transformation process of the word vector, which could appropriately reduce the natural semantic of the word vector and improve the correlation with the predicate word vector. Bi-LSTM network was used to automatically learn feature expression, then CRF and IOBES labeling strategies were used to transform into a word sequence annotation problem, and a part of speech learning method was introduced. The part of speech feature vectors generated by LSTM network learning and the fuzzified part of speech vectors were used as input vectors of the model. In the training process, we adopted the low-batch gradient descent algorithm and Dropout regularization. It not only speeded up the training, but also made it easy to get the global optimal solution, and prevented the occurrence of over-fitting of parameters. Multi-group comparison experiments show that the F value of the labeling results of this method reaches 81.24%.
Keywords:SRL  Fuzzy mechanism  Semantic density clustering  Neural network  Word embedding  CRF  Dropout
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号