By Liu Ziyin and others

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards... Show more

August 13, 2023

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

