In this paper, we propose a graph theoretic approach for recognizing interactions between two human performers present in a video clip. We watch primarily the human poses of each performer and derive descriptors that capture the motion patterns of the poses. From an initial dictionary of poses (visual words), we extract key poses (or key words) by ranking the poses on the centrality measure of graph connectivity. We argue that the key poses are graph nodes which share a close semantic relationship (in terms of some suitable edge weight function) with all other pose nodes and hence are said to be the central part of the graph. We apply the same centrality measure on all possible combinations of the key poses of the two performers to select the set of'key pose doublets'that best represent the corresponding action. The results on standard interaction recognition dataset show the robustness of our approach when compared to the present state of the art method. Copyright 2011 ACM.