Reinforcement learning (RL), which is frequently modeled as sequential learning and decision making in the face of uncertainty, is garnering growing interest in recent years due to its remarkable success in practice. In contemporary RL applications, it is increasingly more common to encounter environments with prohibitively large state and action space, thus imposing stringent requirements on the sample and computational efficiency of the RL algorithms in use. Despite the empirical success, however, the theoretical underpinnings for many popular RL algorithms remain highly inadequate even for the tabular setting.
In this talk, we present two vignettes regarding the effectiveness of RL algorithms. The first vignette demonstrates that a perturbed model-based RL approach is minimax optimal under a generative model, without suffering from a sample size barrier that was present in all past work. The second vignette covers policy optimization in reinforcement learning. On the one hand, we demonstrate that the popular softmax policy gradient method can take exponential time to converge; on the other hand, employing natural policy gradients and enforcing entropy regularization provably achieve fast global convergence. These results cover two distinctive RL paradigms, and might shed light on the efficacy of these algorithms in more complicated scenarios.
Yuxin Chen is currently an assistant professor in the Department of Electrical Engineering at Princeton University, and is affiliated with Applied and Computational Mathematics, Computer Science, and Center for Statistics and Machine Learning. Prior to joining Princeton, he was a postdoctoral scholar in the Department of Statistics at Stanford University, and he completed his Ph.D. in Electrical Engineering at Stanford University. His research interests include high-dimensional statistics, mathematical optimization, and reinforcement learning. He has received the Princeton graduate mentoring award, the AFOSR Young Investigator Award, the ARO Young Investigator Award, the ICCM best paper award (gold medal), and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization.