In spite of the recent advancements in the field of deep learning based techniques for facial expression recognition, the efficiency of the state-of-the-art recognition methods in the wild scenarios, remains a challenge. The main reason behind the less efforts made for handling wild scenarios is two-folds: very less and varying levels of cues available to identify the distinguishable patterns of features (spatial and temporal) and non-availability of a big dataset to train a deep learning model. Recently, a huge dataset called AffectNet is introduced in the literature providing enough base to apply a deep learning model to train. This paper proposes an efficient combination of hand crafted and deep learning features for facial expression recognition in the wild. We use facial landmark points as hand-crafted features and XceptionNet for the deep learned features. We experiment with XceptionNet and Densenet propose the use of XceptionNet as it performs better compared to DenseNet, when applied on wild scenarios. The proposed fusion of the hand-crafted and XceptionNet features outperforms the state-of-the-art methods for facial expression recognition in the wild. © 2020 Elsevier B.V.