Congratulations to Mengrui Zhang for defending his thesis

Congrats to Dr. Mengrui Zhang for completing his PhD on Dec 3, 2025!

Title:  Convergence of reinforcement learning algorithms for mean field game and mean field control problems

Committee: Fouque (Chair), Ichiba, Hu

Abstract: This dissertation establishes convergence results for several reinforcement learning algorithms designed to solve mean field game (MFG) and mean field control (MFC) problems posed in both discrete and continuous spaces. In the first part, we study the unified two-timescale reinforcement learning algorithm introduced in [Angiuli et al., 2022d], which yields either an MFG equilibrium or an MFC optimal control depending on the ratio of two learning rates, one associated with the value function and the other with the mean field term. Our convergence proof reveals that, in the MFC case, multiple mean field distributions must be updated simultaneously, which motivates presenting two distinct algorithms, one tailored to MFG and one to MFC. We work in a finite-state, finite-action, discrete-time, infinite-horizon setting, and our analysis relies on a generalization of the two-timescale stochastic approximation framework of [Borkar, 1997a]. The accuracy of the approximation to the true solutions depends on the smoothing of the policies, and we provide a numerical example that illustrates convergence.

In the second part, we establish the convergence of the deep actor–critic reinforcement learning algorithm proposed in [Angiuli et al., 2023b]. for MFG and MFC problems with continuous state and action spaces and an infinite discrete-time horizon. As in a finite-space setting, the algorithm’s limiting behavior is governed by the ratio between the learning rates for the value function and the mean field term. In the MFC case, to rigorously characterize the limit, we introduce a discretization of the state and action spaces, following the methodology developed in [Angiuli et al., 2023c]. The convergence proofs are again based on a suitable extension of the two-timescale framework of [Borkar, 1997a]. We further extend these results to mean field control games, which feature locally cooperative yet globally competitive populations. Finally, we present numerical experiments for one- and two-dimensional linear–quadratic models, for which explicit solutions are available and can be used to benchmark the performance of the algorithms.