基于机器学习的科技文摘关键词自动提取方法 Automatic Extraction of Keyphrases from Scientific Articles based on machine learning method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于机器学习的科技文摘关键词自动提取方法

引用本文：	刘佳宾陈超正荣吉翔华. 基于机器学习的科技文摘关键词自动提取方法[J]. 计算机工程与应用, 2007, 43(14): 170-172

作者姓名：	刘佳宾陈超正荣吉翔华

作者单位：	中国科学技术大学,电子工程与信息科学系,合肥,230027;中国科学技术大学,电子工程与信息科学系,合肥,230027;中国科学技术大学,电子工程与信息科学系,合肥,230027;中国科学技术大学,电子工程与信息科学系,合肥,230027

基金项目：	国家自然科学基金 , 微软基金

摘要：	本文提出了一种基于机器学习的关键词自动抽取技术,主要是针对数字图书馆中的学术论文的摘要(Abstract)进行抽取。首次提出了以句子为基本抽取单位进行关键词抽取的思想。在提出关键词的候选词时采用n_grams方法和词性相结合的方法,在选取特征时考虑了词组的出现频率、词组在整个摘要中的位置、在所在句子中的位置和词组中单词的个数等特征。实验结果表明该方法能够适应各个领域的论文关键词提取,并且可以得到很好的效果。
关键词：	信息自动抽取决策树词性分析 n_grams 方法
文章编号：	1002-8331（2007）14-0170-03
收稿时间：	2006-06-06
修稿时间：	2006-09-01
Automatic Extraction of Keyphrases from Scientific Articles based on machine learning method

JiaBin Liu. Automatic Extraction of Keyphrases from Scientific Articles based on machine learning method[J]. Computer Engineering and Applications, 2007, 43(14): 170-172

Authors:	JiaBin Liu

Affiliation:	Department of Electronic Engineering and Information Science,University of Science and Technology of China,Hefei 230027,China

Abstract:	In order to realize automatic key phrases extraction from scientific articles.This paper proposes a method that utilize a supervised machine learning method.In order to define the potential terms,This paper combines the n_grams method and Part Of Speech(POS)method.We consider four features to represent terms,including term frequency,relative position of the first occurrence,relative position of the sentence and the number of tokens in a term.Experimental results show that this method performs perfect and is a general method to any field.

Keywords:	information retrieval decision tree Part Of Speech(POS) n_grams method
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏