Resources

SLT for Alignment

Start here:

SLT

Start here: Dialogue introduction to SLT

I want to learn theoretical SLT:

  • Assuming you have a background in mathematics (esp. algebraic geometry), physics, or statistical learning theory…
  • Read through the Theoretical SLT section.

I want to learn applied SLT:

  • Assuming you have a background in ML…
  • Read through the LLC estimation section.
  • Play around with the demo notebooks.
  • Read (at least) the distillations of the DevInterp papers.
  • Start applying LLC estimation to your own models.

Theoretical SLT

Essential reading:

Advanced materials:

  • Watanabe (2022): A good survey by the master himself, about the major results of SLT.
  • See the publications here

Textbooks

The textbooks in SLT are:

The Grey Book
Sumio Watanabe “Algebraic Geometry and Statistical Learning Theory” 2009

  • This is where all the details of the proofs of the main results of SLT are contained. It is a research monograph distilling the results proven over more than a decade. This is not an easy book to read.
  • Chapter 1 provides a coarse treatment of the underlying proof ideas and mechanics.
  • Chapter 2-5: The results of SLT depend on a lot of results from other fields of mathematics (algebraic geometry, distribution theory, manifold, empirical processes, etc). The book gives some background in each of these fields rather quickly. Scattered through these introductions is some material on how these fields relate to the core results in SLT.
  • Chapter 6 contains the main proofs of SLT.
  • Chapter 7 contains applications of the main results and examples of various learning phenomena in singular models.

The Green Book
Sumio Watanabe “Mathematical Theory of Bayesian Statistics” 2018

  • This more recent book is much more focused on learning in singular models (esp. Bayesian learning).
  • There are many exercises at the end of each chapter.
  • This is also where Watanabe handles the non-realisable case. This requires the introduction of a new technical condition known as “relatively finite variance”.
  • While not recapitulating the full proof given in the Grey Book, the Green Book does go through slightly different formulations of the theory and, by assuming some technical results in the Grey Book, it walks through the proofs of most results.

There is also an exercise textbook:
Joe Suzuki, “WAIC and WBIC with R Stan Joe Suzuki 100 Exercises for Building Logic” 2019

Applied/Experimental SLT

LLC Estimation

Currently, the key experimental technique in applying SLT to real-world models is local learning coefficient (LLC) estimation, introduced in Lau et al. (2023).

Putting it in practice: Once you’ve read the above materials, get some hands-on practice with the example notebooks in devinterp, starting with this introductory notebook.

Developmental interpretability

Developmental interpretability proposes to study changes in neural network structure over the course of training (rather than trying to interpret isolated snapshots). This draws on ideas and methods from a range of areas of mathematics, statistics, and the (biological) sciences.

At the moment, the key techniques, namely applying LLC estimation over the course of training, come from Singular Learning Theory (SLT) and to a lesser extent developmental biology and statistical physics.

The readings focus on SLT:

Bonus

​​Alignment

Basics

Interpretability

Bonus

  • If you want to get a higher-level overview of the alignment landscape, check out Alignment 101 and Alignment 201 by Richard Ngo and BlueDot Impact.
  • If you want more material on learning ML, see the ARENA program.
  • If you still haven’t had enough, check out metauni AI-safety seminar reading list.