深度学习在视频对象分割中的应用与展望 Application and Prospect of Deep Learning in Video Object Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

深度学习在视频对象分割中的应用与展望

引用本文：	陈加,陈亚松,李伟浩,田元,刘智,何英.深度学习在视频对象分割中的应用与展望[J].计算机学报,2021,44(3):609-631.

作者姓名：	陈加陈亚松李伟浩田元刘智何英

作者单位：	华中师范大学教育信息技术学院武汉 430079;华中师范大学教育信息技术学院武汉 430079;海德堡大学视觉学习实验室海德堡69120 德国;华中师范大学教育信息技术学院武汉 430079;华中师范大学教育大数据应用技术国家工程实验室武汉 430079;清华大学深圳研究生院广东深圳 518055

基金项目：	本课题得到国家自然科学基金;华中师范大学中央高校基本科研业务费;国家科技支撑计划项目

摘要：	视频对象分割是指在给定的一段视频序列的各帧图像中,找出属于特定前景对象的所有像素点位置区域.随着硬件平台计算能力的提升,深度学习受到了越来越多的关注,在视频对象分割领域也取得了一定的进展.本文首先介绍了视频对象分割的主要任务,并总结了该任务所面临的挑战.其次,对开放的视频对象分割常用数据集进行了简要概述,并介绍了通用的性能评估标准.接着,综述了视频对象分割的研究现状,详细地分析了当前的各种方法,并将它们划分为三大类:半监督的方法,即给出视频第一帧图像中感兴趣对象的详细人工真值标注,分割出视频剩余图像中的感兴趣对象;无监督的方法,即不给任何人工标注信息,自动识别并分割出视频中的前景对象;交互式的方法,即在分割过程中,通过人工交互式的参与,结合粗略的人工标注先验信息,进行视频对象分割.第三类方法的条件相当于前两者的折中:相对于第一类方法,它虽然需要人工的参与,但只需要少量的标注工作量;相对于第二类方法,它给视频序列中某些帧的图像适当地添加了一些人工标注信息,从而更具针对性.最后,对深度学习在视频对象分割任务中的应用,进行了总结和展望.
关键词：	视频对象分割深度学习半监督方法无监督方法交互式方法
Application and Prospect of Deep Learning in Video Object Segmentation

CHEN Jia,CHEN Ya-Song,LI Wei-Hao,TIAN Yuan,LIU Zhi,HE Ying.Application and Prospect of Deep Learning in Video Object Segmentation[J].Chinese Journal of Computers,2021,44(3):609-631.

Authors:	CHEN Jia CHEN Ya-Song LI Wei-Hao TIAN Yuan LIU Zhi HE Ying

Affiliation:	(Department of Education and Information Technology,Central China Normal University,Wuhan 430079;Visual Learning Lab,Heidelberg University,Heidelberg 69120 Germany;National Engineering Laboratory for Educational Big Data,Central China Normal University,Wuhan 430079;Graduate School at Shenzhen,Tsinghua University,Guangdong 518055)

Abstract:	Video object segmentation refers to the technology by which the positions of all pixels belonging to the particular foreground objects in each frame of a given video sequence can be found out and labeled.This technology is one of the most important research topics in the field of computer vision.And it plays an important role in many applications of computer vision,such as 3D reconstruction,automatic driving,video editing,and so on.With the improvement of computing power,deep learning has attracted more and more attention and made significant progress in the task of video object segmentation.Firstly,this paper introduces the main task of video object segmentation and summarizes the main challenges that the task is facing.Secondly,a brief overview of the open datasets for video object segmentation task is given.Then the relevant benchmarks and common performance evaluation criteria are introduced.Thirdly,the research status of video object segmentation is summarized.The relevant methods are introduced and analyzed in detail.And these methods fall in one of the three following categories:the first ones are semi-supervised methods.Namely,the detailed artificial truth annotation of the interested objects in the first frame image of video sequence is given.And the interested objects in the remaining video sequence frames are segmented automatically.At present,in the video object segmentation task of a single instance,the Jaccard score of semi-supervised methods can reach more than 0.8 by taking the DAVIS16 dataset as an example.In the multi-instance video object segmentation task,for example,the DAVIS18 dataset which is widely used,the Jaccard score of semi-supervised methods has reached over 0.7.The second ones are unsupervised methods,which can identify and segment the foreground objects in video by the certain rules or models,without any manual labeling prior information.The third ones are interactive methods,based on the method of interactive rough artificial prior information.In these methods,the rough artificial prior information,such as point,bounding box,and scribble,is obtained from the interactive modules.And video object segmentation is carried out by multiple manual participations,but only a small amount of work at each time.The condition of the third kind of methods can be considered as the compromise of the former two.Compared with the first one,although it requires manual participation,it only requires a small amount of labeling work.Compared with the second one,it appropriately adds some manual labeling information to the images of some frames in the video sequence,which makes the methods more targeted for the interested objects.The best Jaccard scores of the unsupervised methods and the interactive methods can both reach 0.8 in the DAVIS16 dataset.But there are few unsupervised methods that deal with the multi instance problem of the DAVIS18 dataset.The best interactive methods can only reach 0.64 for Jaccard score in the DAVIS 18 interactive dataset.Finally,the applications of deep learning in video object segmentation task are concluded,and some promising ideas are proposed from four different aspects.

Keywords:	video object segmentation deep learning semi-supervised methods unsupervised methods interactive methods
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏