首页 | 官方网站   微博 | 高级检索  
     

融合领域相关度与上下文信息的无监督窄域实体识别方法
引用本文:钟宁,董广场,陈建辉.融合领域相关度与上下文信息的无监督窄域实体识别方法[J].北京工业大学学报,2018,44(6):862-869.
作者姓名:钟宁  董广场  陈建辉
作者单位:北京工业大学信息学部,北京 100124;磁共振成像脑信息学北京市重点实验室,北京 100053;日本前桥工科大学生命情报学科,前桥 371-0816;北京工业大学信息学部,北京,100124;北京工业大学信息学部,北京 100124;磁共振成像脑信息学北京市重点实验室,北京 100053
基金项目:北京市教育委员会科技计划一般资助项目(KM201710005026),国家重点基础研究发展规划资助项目(2014CB744600),国家自然科学基金国际合作重点资助项目(61420106005)
摘    要:针对细分领域实体识别所面临的实体规模受限、语料样本相对缺乏的挑战,提出了一种融合领域相关度与上下文信息的、无监督的窄域实体识别方法.首先,融合词频及上下文信息,设计了术语-语料库相关性假设,并利用对数似然比计算假设的可能性,获得候选实体的领域区分度;在此基础上,基于候选实体的中心词在语料库中的相对领域占比,构建领域依存度函数,识别候选实体的领域倾向性;最后,绑定领域区分度和领域依存度,计算候选实体的领域相关度,选择领域相关度大于阈值的候选实体作为被识别的窄域实体.实验结果表明:该方法在减少识别过程人工干预的同时能有效提升窄域实体识别的准确率.

关 键 词:实体识别  无监督学习  领域相关度  对数似然比

Unsupervised Method for Narrow-domain Entity Recognition by Fusing Domain Relevance Measurement and Word Features of Context
ZHONG Ning,DONG Guangchang,CHEN Jianhui.Unsupervised Method for Narrow-domain Entity Recognition by Fusing Domain Relevance Measurement and Word Features of Context[J].Journal of Beijing Polytechnic University,2018,44(6):862-869.
Authors:ZHONG Ning  DONG Guangchang  CHEN Jianhui
Abstract:To address the challenges,which are the limited number of domain entitiesandtherelative lack ofcorpus samples,for entity recognition in the fine-grained domain, an unsupervised method for narrow-domain entity recognition was proposed by integrating word frequency and context information. Firstly, fusing the word frequency and context information, the new relevance hypothesis with term-corpus was designed, and the probability of hypothesis was calculated by using log likelihood ratio to obtain domain discrimination degree of candidate entities. Based on the relative domain ratio of head-word of candidate entities in the corpus, the domain dependence function was constructed to recognize the domain tendency of the candidate entities; Finally, combining the domain discrimination degree and the domain dependence, the domain relevance measurement of the candidate entities was calculated, and the candidate entities whose domain relevance measurement were greater than the threshold were selected as the narrow domain entities. The experimental results show that the proposed method can improve the accuracy of narrow-domainentity recognition and reduce manual intervention in the recognition process.
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号