首页 | 官方网站   微博 | 高级检索  
     

基于视觉特征引导融合的视频描述方法
引用本文:苗教伟,季怡,刘纯平.基于视觉特征引导融合的视频描述方法[J].计算机工程与应用,2022,58(20):124-131.
作者姓名:苗教伟  季怡  刘纯平
作者单位:苏州大学 计算机科学与技术学院,江苏 苏州 215006
摘    要:视频描述生成因其广泛的潜在应用场景而成为近年来的研究热点之一。针对模型解码过程中视觉特征和文本特征交互不足而导致描述中出现识别错误的情况,提出基于编解码框架下的视觉与文本特征交互增强的多特征融合视频描述方法。在解码过程中,该方法使用视觉特征辅助引导描述生成,不仅为每一步的生成过程提供了文本信息,同时还提供了视觉参考信息,引导其生成更准确的词,大幅度提升了模型产生的描述质量;同时,结合循环dropout缓解解码器存在的过拟合情况,进一步提升了评价分数。在该领域广泛使用的MSVD和MSRVTT数据集上的消融和对比实验结果证明,提出的方法的可以有效生成视频描述,综合指标分别增长了17.2和2.1个百分点。

关 键 词:编解码框架  视频描述  特征融合  dropout  特征交互  

Video Captioning Method Based on Visual Feature Guided Fusion
MIAO Jiaowei,JI Yi,LIU Chunping.Video Captioning Method Based on Visual Feature Guided Fusion[J].Computer Engineering and Applications,2022,58(20):124-131.
Authors:MIAO Jiaowei  JI Yi  LIU Chunping
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Video captioning generation has become one of the research hotspots in recent years because of its wide range of potential applications. Aiming at the problem of recognition error caused by insufficient interaction between visual features and text features in the process of model decoding, a multi feature fusion video captioning method based on enhanced interaction between visual features and text features in the encoder-decoder framework is proposed. In the decoding process, the method exerts visual features to guide the captioning generation, which not only provides text information for each step of the generation process, but also provides visual reference information to guide it to generate more accurate words, which greatly improves the captioning quality of the model generation. At the same time, combined with recurrent dropout to alleviate the over fitting of decoder, the evaluation score is further improved. Experimental results on MSVD and MSRVTT datasets show that the proposed method can generate video captioning effectively, and the comprehensive score increases by 17.2 and 2.1 percentage points respectively.
Keywords:encoder-decoder framework  video captioning  feature fusion  dropout  feature interaction  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号