Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

S.G. Tanneru; Snehasis Mukherjee

doi:10.1007/978-981-16-1092-9_3

Profiles Research Units Publications

Conferences

Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

S.G. Tanneru,

Published in Springer Science and Business Media Deutschland GmbH

2021

DOI: 10.1007/978-981-16-1092-9_3

Volume: 1377 CCIS

Pages: 29 - 38

Abstract

Action recognition in video sequences is an active research problem in Computer Vision. However, no significant efforts have been made for recognizing actions in hazy videos. This paper proposes a novel unified model for action recognition in hazy video using an efficient combination of a Convolutional Neural Network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action. First, each frame of the hazy video is fed into the AOD-Net (All-in-One Dehazing Network) model to obtain the clear representation of frames. Next, spatial features are extracted from every sampled dehazed frame (produced by the AOD-Net model) by using a pre-trained VGG-16 architecture, which helps reduce the redundancy and complexity. Finally, the temporal information across the frames are learnt using a DB-LSTM network, where multiple LSTM layers are stacked together in both the forward and backward passes of the network. The proposed unified model is the first attempt to recognize human action in hazy videos. Experimental results on a synthetic hazy video dataset show state-of-the-art performances in recognizing actions. © 2021, Springer Nature Singapore Pte Ltd.

Topics: Convolutional neural network (54)% and Frame (networking) (51)%

View more info for "Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features"

About the journal

Published in Springer Science and Business Media Deutschland GmbH