Semantics-aware transformer for 3D reconstruction from binocular images |
| |
Authors: | JIA Xin YANGShourui and GUAN Diyi |
| |
Affiliation: | The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China,The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China and Zhejiang University of Technology, Hangzhou 310014, China |
| |
Abstract: | Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods. |
| |
Keywords: | |
|
| 点击此处可从《光电子快报》浏览原始摘要信息 |
|
点击此处可从《光电子快报》下载全文 |
|