首页 | 官方网站   微博 | 高级检索  
     

自动文摘系统中的主题划分问题研究
引用本文:傅间莲,陈群秀.自动文摘系统中的主题划分问题研究[J].中文信息学报,2005,19(6):30-37.
作者姓名:傅间莲  陈群秀
作者单位:清华大学计算机系智能技术与系统国家重点实验室,北京 100084
摘    要:随着网络的发展,电子文本大量涌现,自动文摘以迅速、快捷、有效、客观等手工文摘无可比拟的优势,使得其实用价值得到充分体现。而主题划分是自动文摘系统中文本结构分析阶段所要解决的一个重要问题。本文提出了一个通过建立段落向量空间模型,根据连续段落相似度进行文本主题划分的算法,解决了文章的篇章结构分析问题,使得多主题文章的文摘更具内容全面性与结构平衡性。实验结果表明,该算法对多主题文章的主题划分准确率为9212 % ,对单主题文章的主题划分准确率为9911 %。

关 键 词:计算机应用  中文信息处理  自动文摘  向量空间模型  段落相似度  主题划分  
文章编号:1003-0077(2005)06-0028-08
收稿时间:2005-01-11
修稿时间:2005-07-11

Study on Topic Partition in Automatic Abstracting System
FU Jian-lian,CHEN Qun-xiu.Study on Topic Partition in Automatic Abstracting System[J].Journal of Chinese Information Processing,2005,19(6):30-37.
Authors:FU Jian-lian  CHEN Qun-xiu
Affiliation:State Key Lab of Intelligent Technology and System ,Department of Computer Science and Technology , Tsinghua University , Beijing 100084 , China
Abstract:With the development of network, electronic text grows rapidly. Since automatic abstraction is superior to manual abstraction for its speed, convenience, efficiency, and impersonality. It has wide applications and such research is becoming a hot topic. Topic partition is a significant problem during text structuring in automatic abstracting system. The paper establishes vector space model for the whole article based on paragraph, then proposes an algorithm for multi-topic text partitioning based on sequential paragraphic similarity. It solves the problem of chapter structural analysis in multi-topic article and makes the abstract of the multi-topic to have more general content and more balanced structure. The experiment on close test shows that the precision of topic partition for multi-topic text and single-topic text reach 92.2% and 99.1% respectively.
Keywords:computer application  Chinese information processing  automatic abstraction  vector space model  paragraphic similarity  topic segmentation
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号