The paper presents a novel algorithm based on game theory for optimizing controllers in presence of additive exogenous disturbances and parametric uncertainties of the controlled plant. The problem of controller design has been framed as a zero-sum Markov game between the controller and the disturber (disturbances). The proposed algorithm employs an 'annealed1 combination, of a variation of the 'cautious fictitious play' approach and the 'min-max5 solution methodology, for generating an optimal policy via a simple linear program. In Reinforcement Learning (RL) paradigm, controller optimization is done either in the Markov Decision Process (MDP) framework that assumes not only a stationary environment but is also inadequate for modeling noise and disturbances or the solution approach, which is computationally demanding and at times, even infeasible. The proposed approach generates a 'safe' and 'universally consistent' controller, by using a hybridization of the min-max strategy and a variation of cautious fictitious play. We empirically evaluate the approach on (i) Inverted Pendulum swing-up and, (ii) Non-holonomic Mobile Robot control tasks and compare its performance against standard O learning. Copyright © IICAI 2005.