We propose a Bag-of-Words (BoW) based technique for human action recognition in videos containing challenges like illumination changes, background changes and camera shaking. We build the pose descriptors corresponding to the actions, based on the gradient-weighted optical flow (GWOF) measure, to minimize the noise related to camera shaking. The pose descriptors are clustered and stored in a dictionary of poses. We further generate a reduced dictionary, where words are termed as pose duplet. The pose duplets are constructed by a graphical approach, considering the probability of occurrence of two poses sequentially, during an action. Here, poses of the initial dictionary, are considered as the nodes of a weighted directed graph called the duplet graph. Weight of each edge of the duplet graph is calculated based on the probability of the destination node of the edge to appear after the source node of the edge. The concatenation of the source and destination pose vectors is called pose duplet. We rank the pose duplets according to the weight of the edge between them. We form the reduced dictionary with the pose duplets with high edge weights (called dominant pose duplet). We construct the action descriptors for each actions, using the dominant pose duplets and recognize the actions. The efficacy of the proposed approach is tested on standard datasets. © Springer International Publishing Switzerland 2015.