The gap between optimization theory and machine learning practice is even wider than previously thought. I will talk about a surprising phenomenon we observed while training deep neural networks: the parameters do not converge to stationary points of the loss function! Despite this non-convergence of weights, we observe that the progress in minimizing the loss saturates. This unexpected behavior challenges existing convergence theory, and we offer an initial explanation based on ergodic theory of dynamical systems. swathe of new research questions. This phenomenon also opens a variety of research questions worthy of further investigation; I will highlight some of these in the talk. Talk based on joint work with Jingzhao Zhang, Haochuan Li, Ali Jadbabaie.
Suvrit Sra is an Associate Professor in the EECS Department at MIT, and also a core faculty member of the Laboratory for Information and Decision Systems (LIDS), the Institute for Data, Systems, and Society (IDSS), as well as a member of MIT-ML and Statistics groups. His research bridges a number of mathematical areas such as differential geometry, matrix analysis, convex analysis, probability theory, optimal transport, and optimization with machine learning. He is also a co-founder and Chief Scientist of Macro-Eyes, an AI driven startup.