The rapid development of low-cost sensors, smart devices, communication networks, and learning algorithms has enabled data driven decision making in large-scale multi-agent systems. Prominent examples include mobile robotic networks and autonomous systems. The key challenge in these systems is in handling the vast quantities of information shared between the agents in order to find an optimal policy that maximizes an objective function. Among potential approaches, distributed reinforcement learning, which is not only amenable to low-cost implementation but can also be implemented in real time, has been recognized as an important approach to address this challenge.
The focus of this talk is to consider the policy evaluation problem in multi-agent reinforcement learning, one of the most fundamental problems in this area. In this problem, a group of agents operate in an unknown environment, where their goal is to cooperatively evaluate the global discounted accumulative reward composed of local rewards observed by the agents. For solving this problem, I consider a distributed variant of the popular temporal difference learning, often referred to as TD(
Bio: Thinh T. Doan is a TRIAD postdoc fellow at the Georgia Institute of Technology, joint between the School of Electrical and Computer Engineering (ECE) and the School of Industrial and Systems Engineering. He was born in Vietnam, where he got his Bachelor degree in Automatic Control at Hanoi University of Science and Technology in 2008. He obtained his Master and Ph.D. degrees both in ECE from the University of Oklahoma in 2013 and the University of Illinois at Urbana-Champaign (UIUC) in 2018, respectively. At Illinois, he was the recipient of the Harriett & Robert Perry Fellowship in ECE in 2016 and 2017. His research interests lie at the intersection of control theory, optimization, distributed algorithms, and applied probability, with the main applications in machine learning, reinforcement learning, and multi-agent systems.