Sign in

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

By David Hoffmann and others at
LogoUniversity of Freiburg
In this work, we study rapid improvements of the training loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate task and both training and validation loss saturate for hundreds of epochs. When transformers finally learn the intermediate task, they do... Show more
1
June 6, 2024
=
0
Loading PDF…
Loading full text...
Similar articles
Loading recommendations...
=
0
x1
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
Click on play to start listening