Estimating the Local Learning Coefficient at Scale

Zach Furman

Timaeus

Edmund Lau

University of Melbourne

February 6, 2024

Abstract

The local learning coefficient (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in { t arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.

Main contributions:

The estimated LLC is accurate. Estimated values of the LLC closely match theoretical predictions in DLNs (where the ground-truth is known).
The estimated LLC is scalable. These estimates are accurate up to 100m parameter models.
The estimated LLC is self-consistent. These estimates are invariant to rescaling symmetries as predicted by theory and as required for a suitable measure of model complexity.

Distillation coming soon.

Cite as

@article{furman2024estimating,
  title = {Estimating the Local Learning Coefficient at Scale},
  author = {Zach Furman and Edmund Lau},
  year = {2024},
  abstract = {The local learning coefficient (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {	t arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.},
  eprint = {2402.03698},
  archivePrefix = {arXiv},
  url = {https://arxiv.org/abs/2402.03698}
}

Click to copy