Augmented two stream network for robust action recognition adaptive to various action videos |
| |
Affiliation: | 1. Department of Electrical Engineering, Indian Institute of Technology Patna, Bihta 801 106, Patna, India;2. Department of Physics, Indian Institute of Technology Patna, Bihta 801 106, Patna, India |
| |
Abstract: | In video-based action recognition, using videos with different frame numbers to train a two-stream network can result in data skew problems. Moreover, extracting the key frames from a video is crucial for improving the training and recognition efficiency of action recognition systems. However, previous works suffer from problems of information loss and optical-flow interference when handling videos with different frame numbers. In this paper, an augmented two-stream network (ATSNet) is proposed to achieve robust action recognition. A frame-number-unified strategy is first incorporated into the temporal stream network to unify the frame numbers of videos. Subsequently, the grayscale statistics of the optical-flow images are extracted to filter out any invalid optical-flow images and produce the dynamic fusion weights for the two branch networks to adapt to different action videos. Experiments conducted on the UCF101 dataset demonstrate that ATSNet outperforms previously defined methods, improving the recognition accuracy by 1.13%. |
| |
Keywords: | Two-stream network Action recognition Data skew |
本文献已被 ScienceDirect 等数据库收录! |
|