首页 | 官方网站   微博 | 高级检索  
     

基于LSTM网络的中文地址分词法的设计与实现
引用本文:张文豪.基于LSTM网络的中文地址分词法的设计与实现[J].计算机应用研究,2018,35(12).
作者姓名:张文豪
作者单位:武汉邮电科学研究院
基金项目:国家高技术研究发展计划(863计划);国家自然科学基金资助项目
摘    要:当前中文地址的分词法主要采用基于规则和传统机器学习的方法。这些方法需要人工长期维护词典和提取特征。为避免特征工程和减少人工维护,提出了将长短时记忆(long short-term memory,LSTM)网络和双向长短时记忆(bi-directional long short-term memory,Bi-LSTM)网络分别应用在中文地址分词任务中,并采用四词位标注法以及增加未标记数据集的方法提升分词性能。在自建数据集上的实验结果表明:中文地址分词任务应用Bi-LSTM网络结构能得到较好性能,在增加未标记数据集的情况下,可以有效提升模型的性能。

关 键 词:中文地址  分词  LSTM  未标记数据集
收稿时间:2017/8/28 0:00:00
修稿时间:2018/11/5 0:00:00

Design and Implementation of Chinese Address Segmentation Method Based on LSTM Networks
Affiliation:Wuhan Research Institute of Posts and Telecommunications
Abstract:Currently most methods for Chinese address segmentation are mainly based on rules and traditional machine learning technology. However, these methods maintain dictionary and extract features with artificial maintenance for a long time. In order to avoid feature engineering and reduce artificial maintenance, this paper compared the performance between LSTM(long short-term memory) and bidirectional LSTM applied to Chinese address segmentation,with four-tag-set and character embedding. This paper also added abundant unlabeled Chinese address to enhance the performance. The result on self-built set shows that both LSTM and bidirectional LSTM neural networks work well, and bidirectional LSTM has a bit good performance. Also, adding extra unlabeled set can great improve the performance.
Keywords:
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号