We propose a unified method for recognizing human action and human related events in a realistic video. We use an efficient pipeline of (a) a 3D representation of the Improved Dense Trajectory Feature (DTF) and (b) Fisher Vector (FV). Further, a novel descriptor is proposed, capable of representing human actions and human related events based on the FV representation of the input video. The proposed unified descriptor is a 168-dimensional vector obtained from each video sequence by statistically analyzing the motion patterns of the 3D joint locations of the human body. The proposed descriptor is trained using binary Support Vector Machine (SVM) for recognizing human actions or human related events. We evaluate the proposed approach on two challenging action recognition datasets: UCF sports and CMU Mocap datasets. In addition to the two action recognition dataset, the proposed approach is tested on the Hollywood2 event recognition dataset. On all the benchmark datasets for both action and event recognition, the proposed approach has shown its efficacy compared to the state-of-the-art techniques. © 2017, Springer Science+Business Media, LLC.