We propose a deep learning based technique to classify actions based on Long Short Term Memory (LSTM) networks. The proposed scheme first learns spatial temporal features from the video, using an extension of the Convolutional Neural Networks (CNN) to 3D. A Recurrent Neural Network (RNN) is then trained to classify each sequence considering the temporal evolution of the learned features for each time step. Experimental results on the CMU MoCap, UCF 101, Hollywood 2 dataset show the efficacy of the proposed approach. We extend the proposed framework with an efficient motion feature, to enable handling significant camera motion. The proposed approach outperforms the existing deep models for each dataset. © Springer Nature Singapore Pte Ltd. 2018.