首页 | 官方网站   微博 | 高级检索  
     

基于全局/局部共现词对分布的汉越双语新闻事件线索分析
引用本文:高盛祥,余正涛,龙文旭,丁 硙,闫春婷.基于全局/局部共现词对分布的汉越双语新闻事件线索分析[J].中文信息学报,2015,29(6):90-97.
作者姓名:高盛祥  余正涛  龙文旭  丁 硙  闫春婷
作者单位:昆明理工大学 信息工程与自动化学院,云南 昆明 650500
基金项目:国家自然科学基金(61472168,61175068,61163004);云南省自然科学基金重点项目(2013FA130),云南省科技创新人才基金(2014HE001)资助;云南大学软件工程重点实验室开放基金(2011SE14)
摘    要:针对汉越双语新闻事件线索分析,提出了基于全局/局部共现词对分布的汉越双语事件线索生成方法。该方法首先将新闻话题词语分布作为全局词语表征全局事件,然后用一定时间粒度下新闻片段特有的时间、人物、地点等事件元素作为局部词语,分析新闻片段中全局词语和局部词语的共现关系,将全局/局部词语的共现规律作为监督信息,结合RCRP算法和汉越双语新闻的对齐语料,构建有监督话题生成主题模型,获得相应时间跨度下代表事件发展进程的子话题分布,通过子话题的分布反映事件发展的线索,从而构建出在线汉越双语事件线索生成模型。实验在汉越混合新闻数据集上进行,事件线索生成对比实验结果证明了提出的方法的有效性。


关 键 词:汉语-越南语  新闻事件线索  全局/局部共现词对  子话题分布  双语主题模型  
  

Chinese-Vietnamese Bilingual News Event Storyline Analysis Based on Words Co-occurrence Distribution
GAO Shengxiang,YU Zhengtao,LONG Wenxu,DING Wei,YAN Chunting.Chinese-Vietnamese Bilingual News Event Storyline Analysis Based on Words Co-occurrence Distribution[J].Journal of Chinese Information Processing,2015,29(6):90-97.
Authors:GAO Shengxiang  YU Zhengtao  LONG Wenxu  DING Wei  YAN Chunting
Affiliation:School of Information Engineering and Automation, Kunming University of Science and Technology,
Kunming, Yunnan 650500,China)
Abstract:Aiming at Chinese-Vietnamese bilingual news event storyline analysis, a generative model for event storyline is proposed based on global/local word pairs’ co-occurrence distribution. Firstly, the detected news topic word distribution was used as global words to characterize a global event, Then time, person, place and other event elements in the news segment divided by certain time granularity are used as local words. The are co-occurrence of global and local words is analyzed and used as supervised information, with RCRP algorithm and bilingual aligned words together, which are integrated into a bilingual topic model to get sub-topic distribution under corresponding time slice. Finally, by the sub-topic distribution representing the developing process of an event, a generative model to storyline was constructed. On Chinese-Vietnamese mixed news set crawled from the internet, the comparative experiments of storyline generation are conducted, proving that the proposed bilingual news storyline is model got better effect than the other methods.
Key words Chinese-Vietnamese; news event storyline; global/local co-occurrence words; sub-topic distribution; bilingual topic model


Keywords:Chinese-Vietnamese  news event storyline  global/local co-occurrence words  sub-topic distribution  bilingual topic model  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号