CNN image caption generation |
| |
Authors: | LI Yong CHENG Honghong LIANG Xinyan GUO Qian QIAN Yuhua |
| |
Affiliation: | 1. Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China;2. Key Lab. of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China;3. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China |
| |
Abstract: | The image caption generation task needs to generate a meaningful sentence which can accurately describe the content of the image. Existing research usually uses the convolutional neural network to encode image information and the recurrent neural network to encode text information, due to the “serial character” of the recurrent neural network which result in the low performance. In order to solve this problem, the model we proposed is completely based on the convolutional neural network. We use different convolutional neural networks to process the data of two modals simultaneously. Benefiting from the “parallel character” of convolution operation, the efficiency of the operation has been significantly improved, and experiments have been carried out on two public data sets. Experimental results have also been improved in the specified evaluation indexes, which indicates the effectiveness of the model for processing the image caption generation task. |
| |
Keywords: | multi-modal data image caption long short term memory neural networks |
|
| 点击此处可从《西安电子科技大学学报》浏览原始摘要信息 |
|
点击此处可从《西安电子科技大学学报》下载全文 |
|