Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Authors

Affiliations

Zhongtian Chen University of Melbourne Edmund Lau University of Melbourne Jake Mendel University of Melbourne Susan Wei University of Melbourne Daniel Murfet University of Melbourne

Published

Oct 10, 2023
Read paper

Abstract

We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT). We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular k-gons are critical points. We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these k-gons determines phase transitions in the Bayesian posterior as a function of training sample size. We then show empirically that the same k-gon critical points also determine the behavior of SGD training. The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and low complexity to regions of low loss and high complexity.