首页 | 官方网站   微博 | 高级检索  
     

基于字序列标注的中文关键词抽取研究
引用本文:王昊,邓三鸿,苏新宁.基于字序列标注的中文关键词抽取研究[J].现代图书情报技术,2011(12):39-45.
作者姓名:王昊  邓三鸿  苏新宁
作者单位:南京大学信息管理系
摘    要:以某大学图书馆的所有馆藏书目为研究对象,在对图书关键词标引信息进分析的基础上,总结中文关键词的基本特点及其抽取规律,构建一个基于字序列标注的中文关键词抽取模型,提出中文关键词抽取的基础思路和实现方案,并通过实验论证模型的合理性、正确性和实用性,认为字序列标注方法优于词序列标注,基本上可以解决不分词情况下的中文关键词抽取问题。

关 键 词:序列标注  条件随机场  关键词抽取  机器学习  字序列  词序列

Research on Chinese Keywords Extraction Based on Characters Sequence Annotation
Wang Hao Deng Sanhong Su Xinning.Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J].New Technology of Library and Information Service,2011(12):39-45.
Authors:Wang Hao Deng Sanhong Su Xinning
Affiliation:Wang Hao Deng Sanhong Su Xinning (Department of Information Management,Nanjing University,Nanjing 210093,China)
Abstract:Based on the whole Chinese booklist of a certain university library as well as the analysis of its book indexing information,the paper summarizes the features and extracting laws of Chinese keywords,and establishes a Chinese keywords extraction model based on characters sequence annotation,which proposes the basic idea and implementation scheme for extracting keywords.It verifies the feasibility,rationality and practicality of the model by large - scale experiments, and basically solves the problems of Chinese keywords extraction without executing words segmentation,which shows that characters sequence annotation is better than words sequence annotation.
Keywords:Sequence annotation  Conditional random fields  Keywords extraction  Machine learning  Characters sequence  Words sequence
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号