Loss landscape geometry reveals stagewise development of transformers

George Wang

Timaeus

Matthew Farrugia-Roberts

Timaeus

Jesse Hoogland

Timaeus

Liam Carroll

Timaeus

Susan Wei

University of Melbourne

Daniel Murfet

University of Melbourne

June 16, 2024 · ICML HiLD Workshop · Best Paper

Abstract

The development of the internal structure of neural networks throughout training occurs in tandem with changes in the local geometry of the population loss. By quantifying the degeneracy of this geometry using the recently proposed Local Learning Coefficient, we show that the training process for a transformer language model can be decomposed into discrete developmental stages. We connect these stages to interpretable shifts in input–output behavior and developments in internal structure. These findings offer new insights into transformer development and underscore the crucial role of loss landscape geometry in understanding the dynamics of deep learning.

This is a workshop version of the Developmental Landscape paper that won best paper at the 2024 ICML High-dimensional Learning Dynamics workshop.

Cite as

@inproceedings{wang2024stagewise,
  booktitle = {ICML 2024 Workshop on High-dimensional Learning Dynamics},
  title = {Loss landscape geometry reveals stagewise development of transformers},
  author = {George Wang and Matthew Farrugia-Roberts and Jesse Hoogland and Liam Carroll and Susan Wei and Daniel Murfet},
  year = {2024},
  abstract = {The development of the internal structure of neural networks throughout training occurs in tandem with changes in the local geometry of the population loss. By quantifying the degeneracy of this geometry using the recently proposed Local Learning Coefficient, we show that the training process for a transformer language model can be decomposed into discrete developmental stages. We connect these stages to interpretable shifts in input–output behavior and developments in internal structure. These findings offer new insights into transformer development and underscore the crucial role of loss landscape geometry in understanding the dynamics of deep learning.},
  url = {https://openreview.net/forum?id=2JabyZjM5H}
}

Click to copy