首页 | 官方网站   微博 | 高级检索  
     

基于互联网和self-training的中文问答模式学习
引用本文:李志圣,孙越恒,何丕廉,候越先.基于互联网和self-training的中文问答模式学习[J].计算机应用,2008,28(6):1575-1577.
作者姓名:李志圣  孙越恒  何丕廉  候越先
作者单位:天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院 天津大学 计算机科学与技术学院
基金项目:国家自然科学基金 , 天津市应用基础研究项目
摘    要:在已有的问答模式学习中,模式定义和候选答案评分偏于简单,而且学习过程依赖于人工标定语料。通过挖掘Web文本中动、名词序列的骨架模式,用以扩充模式定义;将self-training学习机制引入问答模式学习:用一对训练语料进行初始学习,通过互联网搜索,自动选择可靠程度较高的问答对,重新训练;扩充了启发规则,改进候选答案的评分方法。实验结果表明:所提出的问答模式学习方法能有效地提高中文问答系统的性能。

关 键 词:互联网    问答模式    self-training    机器学习
文章编号:1001-9081(2008)06-1575-03
收稿时间:2007-12-14
修稿时间:2007年12月1日

Chinese question answering pattern learning based on self-training mechanism and Web
LI Zhi-sheng,SUN Yu-cheng,HE Pi-lian,HOU Yue-xian.Chinese question answering pattern learning based on self-training mechanism and Web[J].journal of Computer Applications,2008,28(6):1575-1577.
Authors:LI Zhi-sheng  SUN Yu-cheng  HE Pi-lian  HOU Yue-xian
Affiliation:LI Zhi-sheng,SUN Yue-heng,HE Pi-lian,HOU Yue-xianCollege of Computer Science , Technology,Tianjin University,Tianjin 300072,China
Abstract:In the past, the learning for QA pattern relies on the labeled data, and the definition of pattern and the scoring method for the candidate answers are over simplified. The verb and noun sequence was extracted as the skeleton pattern to expand definition of QA pattern. In the learning process, a learning mechanism was established based on self-training. At first, the initial study was completed on a labeled QA pair, then the system would automatically select the reliable data for self training through searching in the Web while the system was running. The scoring method of the candidate answers was also improved by applying several heuristic rules. The experimental results show that the performance of Chinese QA system based on our method is improved significantly.
Keywords:Web  QA pattern  self-training  machine learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号