首页 | 官方网站   微博 | 高级检索  
     


CNN image caption generation
Authors:LI Yong  CHENG Honghong  LIANG Xinyan  GUO Qian  QIAN Yuhua
Affiliation:1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China;2. Key Lab. of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China;3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
Abstract:The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task.
Keywords:multi-modal data  image caption  long short term memory  neural networks  
点击此处可从《西安电子科技大学学报》浏览原始摘要信息
点击此处可从《西安电子科技大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号