Toward Understanding Reinforcement Learning with Asymmetric Information

March 13, 2026, Webb Hall 1100

Kaiqing Zhang

Abstract

Recent years have seen tremendous successes of reinforcement learning (RL) for decision-making, with prominent examples including Go-game playing, robotics, autonomous driving, and large-language-model post-training. Motivated by these empirical advances, substantial progress has been made on the theoretical understanding of RL, largely under the assumption of full state observability. However, many of the aforementioned applications operate under "information constraints": agents only have partial observability and limited information of the environment, which are known to underlie fundamental hardness in both control theory and computational complexity theory. At the same time, a growing body of heuristics has been developed to address such constraints in practice, with strong empirical performance. In this talk, I will discuss our explorations to better understand these heuristics of RL with "asymmetric information". First, we study the asymmetric information between “training" and “testing”: certain privileged information is available and exploited during training, which has been a common practice in robot learning and deep RL. We analyze both the pitfalls and efficiency of this paradigm in partially observable RL, providing theoretical insights into when and why it helps. Second, we will focus on the asymmetric information across "different agents”, who make decisions under partial and decentralized observations. We formalize another popular heuristic that addressed such partial observability, learning-to-communicate (LTC), in which agents jointly learn communication protocols for information sharing alongside control policies. We analyze LTC through the lens of "information structures", a well studied notion in decentralized stochastic control and dynamic games, and identify conditions under which LTC could be more statistically and computationally tractable. Time permitting, I will conclude with additional thoughts on building principled RL agents under information constraints.

Speaker's Bio

Kaiqing Zhang is currently an Assistant Professor at the Department of Electrical and Computer Engineering (ECE) and the Institute for Systems Research (ISR), with also joint appointment at the Department of Computer Science (CS), at the University of Maryland, College Park. He is also a member of the Center for Machine Learning, Maryland Robotics Center, and Artificial Intelligence Interdisciplinary Institute at Maryland. Prior to joining Maryland, he was a postdoctoral scholar affiliated with LIDS and CSAIL at MIT, and a Research Fellow at the Simons Institute for the Theory of Computing at Berkeley. He obtained his Ph.D. from the Department of ECE at the University of Illinois at Urbana-Champaign (UIUC). He also received M.S. in both ECE and Applied Mathematics from UIUC, and B.E. in Automation with a second degree in Economics from Tsinghua University. His research interests lie in Systems and Control Theory, Game Theory, Machine Learning, Robotics, Computation, and their intersections. His work has been recognized by several awards, including the Simons-Berkeley Research Fellowship, Coordinated Science Lab Thesis Award, ICML Outstanding Paper Award, AAAI New Faculty Highlights, NSF CAREER Award, AFOSR YIP Award, faculty awards from Cisco Research, JP Morgan, and Open Philanthropy, and the George Corcoran Memorial Award for Teaching and Educational Leadership.

Video URL: