首页 | 官方网站   微博 | 高级检索  
     

汉语框架语义角色的自动标注
引用本文:李济洪,王瑞波,王蔚林,李国臣. 汉语框架语义角色的自动标注[J]. 软件学报, 2010, 21(4): 597-611. DOI: 10.3724/SP.J.1001.2010.03756
作者姓名:李济洪  王瑞波  王蔚林  李国臣
作者单位:1. 山西大学,计算中心,山西,太原,030006
2. 山西大学,数学科学学院,山西,太原,030006
3. 山西大学,计算机与信息技术学院,山西,太原,030006
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60873128 (国家自然科学基金); the National High- Tech Research and Development Plan of China under Grant No.2006AA01Z142 (国家高技术研究发展计划(863))
摘    要:基于山西大学自主开发的汉语框架语义知识库(CFN),将语义角色标注问题通过IOB策略转化为词序列标注问题,采用条件随机场模型,研究了汉语框架语义角色的自动标注.模型以词为基本标注单元,选择词、词性、词相对于目标词的位置、目标词及其组合为特征.针对每个特征设定若干可选的窗口,组合构成模型的各种特征模板,基于统计学中的正交表,给出一种较优模板选择方法.全部实验在选出的25个框架的6 692个例句的语料上进行.对每一个框架,分别按照其例句训练一个模型,同时进行语义角色的边界识别与分类,进行2-fold交叉验证.在给定句子中的目标词以及目标词所属的框架情况下,25个框架交叉验证的实验结果的准确率、召回率、F1-值分别达到74.16%,52.70%和61.62%.

关 键 词:汉语框架语义知识库  语义角色标注  正交表  特征选择  条件随机场
收稿时间:2008-11-22
修稿时间:2009-10-14

Automatic Labeling of Semantic Roles on Chinese FrameNet
LI Ji-Hong,WANG Rui-Bo,WANG Wei-Lin and LI Guo-Chen. Automatic Labeling of Semantic Roles on Chinese FrameNet[J]. Journal of Software, 2010, 21(4): 597-611. DOI: 10.3724/SP.J.1001.2010.03756
Authors:LI Ji-Hong  WANG Rui-Bo  WANG Wei-Lin  LI Guo-Chen
Affiliation:LI Ji-Hong1,WANG Rui-Bo1,WANG Wei-Lin2,LI Guo-Chen3 1(Computer Center,Shanxi University,Taiyuan 030006,China) 2(School of Mathematical Sciences,China) 3(School of Computer , Information Technology,China)
Abstract:Based on the semantic knowledge base of Chinese FrameNet (CFN) self-developed by Shanxi University, automatic labeling of the semantic roles of Chinese FrameNet is turned into a sequential tagging problem at word-level by applying IOB (inside/outside/begin) strategies to the exemplified sentences in CFN corpus, and the Conditional Random Fields (CRF) model is adopted. The basic unit of tagging is word. The word, its part of speech, its relative position to the target word, the target word, and their combination are chosen as the features. Various model templates are formed through optional size windows in each feature, and the orthogonal array within statistics is employed for screening of the better template. All experiments are based on the6 692 exemplified sentences of 25 frames selected from CFN corpus. The separate model is trained for each frame on its exemplified sentences by 2-fold cross-validation, and the processing of identification and classification for the semantic roles are taken simultaneously. Finally, with the target word given in a sentence, as well as the frame name of the target word, the experimental results on all 25 frames data for the precision, the recall, and F1-measure are 74.16%, 52.70%, 61.62%, respectively.
Keywords:Chinese FrameNet   semantic role labeling   orthogonal array   feature selection   conditional random fields
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号