"Catastrophe Principle and Avoiding Sharp Minima with Heavy-Tailed SGD" by Chang-Han Rhee (Northwestern University)

Event Date: 

Monday, April 26, 2021 - 3:30pm to 4:30pm

Event Location: 

  • Virtual via zoom
This talk presents ongoing works on the heavy-tailed rare event analysis and its connection to the success of deep neural networks.
 
While the typical behaviors of the stochastic systems are often deceptively oblivious to the tail behaviors of the underlying uncertainties, vastly different patterns are observed in the way rare events arise depending on the tail distributions. Roughly speaking, in light-tailed settings, the system-wide rare events arise because everything goes wrong a little bit (conspiracy principle), whereas, in heavy-tailed settings, the system-wide rare events arise because a small number of things go terribly wrong (catastrophe principle). Such a "dichotomy" was rigorously characterized in the form of the heavy-tailed large deviation principle in Rhee Blanchet Zwart (2019) for the processes with independent increments and independent spatial components.
 
In the first part of this talk, we will discuss recent progress in the theory of heavy-tailed large deviations for stochastic processes with spatial and temporal correlations such as Autoregressive models, stochastic differential/difference equations, and queue length processes. In the second part, we will uncover a surprising connection between the catastrophe principle and a central mystery of modern AI—the unreasonably good generalization performance of deep neural networks, which is often attributed to the stochastic gradient descent (SGD) algorithm’s mysterious ability to avoid sharp local minima in the loss landscape. We show that under certain structural conditions, SGD is indeed guaranteed to avoid sharp local minima completely. 
 

This talk is based on the ongoing research in collaboration with Mihail Bazhba, Jose Blanchet, Bohan Chen, Sewoong Oh, Zhe Su, Xingyu Wang, and Bert Zwart.