Developmental Interpretability

We are a research community studying how neural networks develop.

September 2025

Bayesian Influence Functions for Hessian-Free Data Attribution cover
Timaeus
Technical University of Munich
University of Colorado Boulder
University of Melbourne

Bayesian Influence Functions for Hessian-Free Data Attribution

September 30, 2025 | Kreer et al.

Classical influence functions face significant challenges when applied to deep neural networks, prim...

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability cover
Timaeus
University of Melbourne

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

September 30, 2025 | Adam et al.

We introduce the loss kernel, an interpretability method for measuring similarity between data point...

August 2025

Embryology of a Language Model cover
Timaeus

Embryology of a Language Model

August 1, 2025 | Wang et al.

Understanding how language models develop their internal computational structure is a central proble...

July 2025

From Global to Local: A Scalable Benchmark for Local Posterior Sampling cover
Timaeus
University of Melbourne

From Global to Local: A Scalable Benchmark for Local Posterior Sampling

July 29, 2025 | Hitchcock and Hoogland

Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well under...

April 2025

Timaeus
University of Melbourne

Modes of Sequence Models and Learning Coefficients

April 25, 2025 | Chen and Murfet

We develop a geometric account of sequence modelling that links patterns in the data to measurable p...

Structural Inference: Interpreting Small Language Models with Susceptibilities cover
Timaeus

Structural Inference: Interpreting Small Language Models with Susceptibilities

April 25, 2025 | Baker et al.

We develop a linear response framework for interpretability that treats a neural network as a Bayesi...

Programs as Singularities cover
Timaeus

Programs as Singularities

April 10, 2025 | Murfet and Troiani

We develop a correspondence between the structure of Turing machines and the structure of singularit...

February 2025

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation cover
Timaeus
Gradient Institute
Monash University
University College London
University of Amsterdam
University of Melbourne
University of Oxford

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

February 8, 2025 | Lehalleur et al.

In this position paper, we argue that understanding the relation between structure in the data distr...

January 2025

Structure Development in List-Sorting Transformers cover
Timaeus
ERA Fellowship
Independent

Structure Development in List-Sorting Transformers

January 30, 2025 | Urdshals and Urdshals | ICML SMUNN Workshop

We study how a one-layer attention-only transformer develops relevant structures while learning to s...

Dynamics of Transient Structure in In-Context Linear Regression Transformers cover
Timaeus
Gradient Institute
University of Melbourne
University of Oxford

Dynamics of Transient Structure in In-Context Linear Regression Transformers

January 29, 2025 | Carroll et al.

Modern deep neural networks display striking examples of rich internal computational structure.

October 2024

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient cover
Timaeus
University of Melbourne

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

October 4, 2024 | Wang et al. | ICLR | Spotlight

We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity...

June 2024

Timaeus
University of Melbourne

Loss landscape geometry reveals stagewise development of transformers

June 16, 2024 | Wang et al. | ICML HiLD Workshop | Best Paper

The development of the internal structure of neural networks throughout training occurs in tandem wi...

February 2024

Timaeus
University of Melbourne

Estimating the Local Learning Coefficient at Scale

February 6, 2024 | Furman and Lau

The local learning coefficient (LLC) is a principled way of quantifying model complexity, originally...

Loss Landscape Degeneracy and Stagewise Development of Transformers cover
Timaeus
University of Melbourne

Loss Landscape Degeneracy and Stagewise Development of Transformers

February 4, 2024 | Hoogland et al. | TMLR | Best Paper at 2024 ICML HiLD Workshop

We show that in-context learning emerges in transformers in discrete developmental stages, when they...

October 2023

University of Melbourne

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

October 10, 2023 | Chen et al.

We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theor...

August 2023

The Local Learning Coefficient: A Singularity-Aware Complexity Measure cover
Timaeus
University of Melbourne

The Local Learning Coefficient: A Singularity-Aware Complexity Measure

August 23, 2023 | Lau et al. | AISTATS 2025

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies.