By Takashi Mori and others

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier *\Delta\log L=\log[L(\theta^s)/L(\theta^*)]* between a... Show more

January 29, 2022

Loading full text...

Similar articles

Loading recommendations...

x1

Power-law escape rate of SGD

Click on play to start listening