首页 | 官方网站   微博 | 高级检索  
     

一种基于词序信息的自动文摘方法
引用本文:任纪生,张弛,王作英.一种基于词序信息的自动文摘方法[J].计算机工程与设计,2007,28(1):178-181.
作者姓名:任纪生  张弛  王作英
作者单位:清华大学,电子工程系,北京,100084
基金项目:国家高技术研究发展计划(863计划)
摘    要:自动文摘技术应尽可能获取准确的相似度以确定句子或段落的权重,但目前常用的基于向量空间模型的计算方法却忽视句子、段落、文本中词的顺序.提出了一种新的基于相邻词序组的相似度度量方法并应用于文本的自动摘要,采用基于聚类的方法实现了词序组的向量表示并以此刻画句子、段落、文本,通过线性插值将基于不同长度词序组的相似度结果予以综合.同时,提出了新的基于含词序组重要性累计度的句子或段落的权重指标.实验证明利用词序信息可有效提高自动文摘质量.

关 键 词:自动文摘  词序  向量空间模型  相似度  权重  词序  信息  自动  文摘  相似度度量方法  word  order  based  text  summarization  质量  利用  验证  权重指标  综合  结果  长度  线性插值  向量表示  聚类  摘要  应用
文章编号:1000-7024(2007)01-0178-04
修稿时间:2005-12-25

Automatic text summarization based on word order
REN Ji-sheng,ZHANG Chi,WANG Zuo-ying.Automatic text summarization based on word order[J].Computer Engineering and Design,2007,28(1):178-181.
Authors:REN Ji-sheng  ZHANG Chi  WANG Zuo-ying
Affiliation:Department ofElectronicEngineering, TsinghuaUniversity, Beijing 100084, China
Abstract:Automatic text summarization obtain accurate similarity measure for determining the weight of a sentence or a paragraph,but the common algorithm based on vector space model actually neglects the word order presented in sentences,paragraphs,and texts.A new computational scheme based on the combination of neighboring word is proposed,which is applied in automatic text summarization.The vector representation for the combination of neighboring word is implemented via clustering and it is used for characterizing senten-ces,paragraphs,or texts.The similarity results of multi-length phrase are integrated through linear interpolation.A new weighting index for sentence or paragraph is also proposed based on the aggregate significance of word's combination.Experimental results show that the using of word order improve the quality of summarization effectively.
Keywords:automatic text summarization  word order  vector space model  similarity measure  weight
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号