Online Optimal Adaptive Control: Real Time Solution of Optimal Control, Multi-Player Dynamic and Graphical Games

November 29, 2011, HFH 4164

Kyriakos Vamvoudakis

Automation & Robotics Research Institute (ARRI), UT Arlington, Advanced Controls & Sensors Group

Abstract

This talk will highlight some new methods for the design of automatic feedback controllers in differential games. Optimal feedback control design has been responsible for much of the successful performance of engineered systems in aerospace, industrial processes, vehicles, ships, robotics, and elsewhere since the 1960s. H-infinity robust control has been used for stabilization of systems with disturbances. However, optimal feedback control design is performed offline by solving optimal design matrix equations including the algebraic Riccati equation and the Game ARE. It is difficult to perform optimal designs for nonlinear systems since they rely on solutions to complicated Hamilton-Jacobi (HJ) or HJ-Isaacs (HJI) equations. Offline solution does not allow performance objectives to be modified as the agents learn. This talk will present online algorithms for learning continuous-time optimal control solutions for linear and nonlinear systems. This is a novel class of adaptive control algorithms that converge to optimal control solutions by online learning in real time. In the linear quadratic (LQ) case, the algorithms learn the solution to the ARE by adaptation along the system motion trajectories. In the case of nonlinear systems with general performance measures, the algorithms learn the (approximate smooth local) solutions of HJ or HJI equations. The algorithms are based on reinforcement learning (RL) techniques. The following topics will be covered. Online algorithms to solve the continuous time infinite horizon optimal control problem. The talk will start with two online algorithms for learning the continuous time optimal control problem with infinite horizon for nonlinear systems. They are implemented as actor/critic RL structures which involve simultaneous adaptation of both actor and critic neural networks. One algorithm does not make any use of the drift dynamics of the system. The algorithms learn online an approximate solution to the HJB equation. Online algorithm for zero-sum games for continuous-time systems. An online learning algorithm will show how to solve the two player zero-sum game (H-infinity problem). The algorithm learns online an approximate local solution to the HJI equation. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Online algorithm for non-zero sum games for continuous time systems. An adaptive algorithm will show how to solve the continuous-time multi player non-zero sum game with infinite horizon for linear and nonlinear systems. Non-zero sum games allow for players to have a cooperative team component and an individual selfish component of strategy. This algorithm learns online the solution of coupled Riccati and coupled Hamilton-Jacobi equations and finds in real time approximations of the optimal value and the Nash equilibrium while also guaranteeing closed-loop stability. ADP using reduced output measurements. Approximate Dynamic Programming (ADP) algorithms that use only reduced output measurements, not full state measurements will be presented for discrete-time dynamical systems. New algorithms will be developed that deliver real-time optimal performance for systems with unknown dynamics and output measurements. It will be seen that the methods converge to the optimal polynomial regulator. Multi-agent differential graphical games. The notion of graphical games will be presented for dynamical systems, where the dynamics and performance indices for each node depend only on local neighbor information. A derivation of coupled Riccati equations for solution of graphical games is proposed. Furthermore a policy iteration algorithm will be shown to converge to the best response when every agent has fixed policies for his neighbors and to the Nash equilibrium when all agents update their policies simultaneously. Finally an online adaptive learning algorithm is proposed to solve the graphical games online.

Speaker's Bio

Kyriakos G. Vamvoudakis was born in Athens Greece. He received the Diploma degree in Electronic and Computer Engineering from the Technical University of Crete, Greece in 2006 with highest honors, the M.Sc. degree in Electrical Engineering from The University of Texas at Arlington in 2008 and his Ph.D. in 2011 under the advising of Dr. Frank L. Lewis. He is currently working as a faculty research associate/research scientist at the Automation and Robotics Research Institute, The University of Texas at Arlington and as an adjunct professor at the Electrical Engineering Department. His current research interests include approximate dynamic programming, game theory, neural network feedback control, optimal control, adaptive control and nonlinear optimization. He has authored or coauthored over 35 technical publications including the book Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. He received the Best Paper Award for Autonomous/Unmanned Vehicles at the 27th Army Science Conference in 2010, the Best Presentation Award at the World Congress of Computational Intelligence in 2010 and the Best Researcher Award at Automation and Robotics Research Institute in 2011. Dr. Vamvoudakis is a registered Electrical/Computer engineer (PE) and member of Technical Chamber of Greece.

Video URL: