首页 | 官方网站   微博 | 高级检索  
     

基于n元词组表示的去噪方法及其在跨语言映射中的应用
引用本文:于墨,赵铁军.基于n元词组表示的去噪方法及其在跨语言映射中的应用[J].智能计算机与应用,2016(2):94-97.
作者姓名:于墨  赵铁军
作者单位:哈尔滨工业大学 计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金(61173073)。
摘    要:具有结构化输出的学习任务(结构化学习)在自然语言处理领域广泛存在。近年来研究人员们从理论上证明了数据标记的噪声对于结构化学习的巨大影响,因此为适应结构化学习任务的去噪算法提出了需求。受到近年来表示学习发展的启发,本文提出将自然语言的子结构低维表示引入结构化学习任务的样本去噪算法中。这一新的去噪算法通过n元词组的表示为序列标注问题中每个节点寻找近邻,并根据节点标记与其近邻标记的一致性实现去噪。本文在命名实体识别和词性标注任务的跨语言映射上对上述去噪方法进行了验证,证明了这一方法的有效性。

关 键 词:表示学习  半监督学习  去噪算法  自然语言处理  跨语言映射

Noise removing based on n-gram representations and its applications to cross-lingual projection
Abstract:Problems with structured predictions ( structured learning) widely exist in natural language processing. Recent research found that compared to classification problems, structured learning problems were affected more seriously by label noises, suggesting the importance of noise removing algorithms for these problems. Inspired by the development of representation learning methods, the paper proposes a noise?removing algorithm for structured learning based on low?dimensional representations of sub?structures. The algorithm finds neighbors of each node in a sequential labeling task based on its associated n?gram representation, and then performs noise removing on the label of a node according to its consistency with the labels of its neighbors. Therefore the paper proves the effectiveness of the proposed algorithm on the cross?lingual projection of named entity recognition and POS tagging tasks.
Keywords:representation learning  semi-supervised learning  noise removing  natural language processing  cross-lingual projection
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号