Sign in

Power-law escape rate of SGD

By Takashi Mori and others
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier \Delta\log L=\log[L(\theta^s)/L(\theta^*)] between a... Show more
January 29, 2022
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Power-law escape rate of SGD
Click on play to start listening