融合深度信息的室内场景分割算法 Indoor Scene Segmentation Algorithm Based on Fusion of Deep Information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合深度信息的室内场景分割算法

引用本文：	王柳,梁铭炬.融合深度信息的室内场景分割算法[J].计算机系统应用,2024,33(3):111-117.

作者姓名：	王柳梁铭炬

作者单位：	郑州大学电气与信息工程学院, 郑州 450001;广东佛山联创工程研究生院软件专业部, 佛山 528300

基金项目：	广东省科技创新战略专项资金(纵向协同管理方向)(2018FS05020102); 佛山市高质量专利培育项目(1920025003148)

摘要：	针对室内复杂场景中, 图像语义分割存在的特征损失和双模态有效融合等问题, 提出了一种基于编码器-解码器架构的融合注意力机制的轻量级语义分割网络. 首先采用两个残差网络作为主干网络分别对RGB和深度图像进行特征提取, 并在编码器中引入极化自注意力机制, 然后设计引入双模态融合模块在不同阶段对RGB特征和深度特征进行有效融合, 接着引入并行聚合金字塔池化以获取区域之间的依赖性. 最后, 采用3个不同尺寸的解码器将前面的多尺度特征图进行跳跃连接并融合解码, 使分割结果含有更多的细节纹理. 将本文提出的网络模型在NYUDv2数据集上进行训练和测试, 并与一些较先进RGB-D语义分割网络对比, 实验证明本文网络具有较好分割性能.
关键词：	RGB-D图像注意力机制多模态融合上下文聚合
收稿时间：	2023/8/31 0:00:00
修稿时间：	2023/9/26 0:00:00
Indoor Scene Segmentation Algorithm Based on Fusion of Deep Information

WANG Liu,LIANG Ming-Ju.Indoor Scene Segmentation Algorithm Based on Fusion of Deep Information[J].Computer Systems& Applications,2024,33(3):111-117.

Authors:	WANG Liu LIANG Ming-Ju

Affiliation:	School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; Software Department, Guangdong Foshan Lianchuang Engineering Graduate School, Foshan 528300, China

Abstract:	A lightweight semantic segmentation network based on encoder-decoder architecture with fusion attention mechanism is proposed to address the issues of feature loss and effective bimodal fusion in image semantic segmentation in complex indoor scenes. Firstly, two residual networks are used as backbone networks to extract features from RGB and depth images, and a polarized self-attention (PSA) module is introduced into the encoder. Then, a bimodal fusion module is designed and introduced to effectively fuse RGB and depth features at different stages. A context module is introduced to obtain dependencies between regions. Finally, three decoders of different sizes are applied to skip connect and fuse the previous multi-scale feature maps to improve the segmentation accuracy of small targets. The proposed network model is trained and tested on the NYUDv2 datasets and compared with more advanced RGB-D semantic segmentation networks. The experiments show that the proposed network has good segmentation performance.

Keywords:	RGB-D image attention mechanism multimodal fusion context aggregation

	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏