Developmental Interpretability

We are a research community studying how neural networks develop.

October 2025

Influence Dynamics and Stagewise Data Attribution

October 14, 2025 | Lee et al.

Current training data attribution (TDA) methods treat the influence one sample has on another as sta...

Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory

October 14, 2025 | Urdshals et al.

We study neural network compressibility by using singular learning theory to extend the minimum desc...

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

October 1, 2025 | Adam et al.

We introduce the loss kernel, an interpretability method for measuring similarity between data point...

Read More arXiv ↗

September 2025

Bayesian Influence Functions for Hessian-Free Data Attribution

September 30, 2025 | Kreer et al.

Classical influence functions face significant challenges when applied to deep neural networks, prim...

Read More arXiv ↗

August 2025

Embryology of a Language Model

August 1, 2025 | Wang et al.

Understanding how language models develop their internal computational structure is a central proble...

July 2025

From Global to Local: A Scalable Benchmark for Local Posterior Sampling

July 29, 2025 | Hitchcock and Hoogland

Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well under...

Read More arXiv ↗

April 2025

Modes of Sequence Models and Learning Coefficients

April 25, 2025 | Chen and Murfet

We develop a geometric account of sequence modelling that links patterns in the data to measurable p...

Read More arXiv ↗

Structural Inference: Interpreting Small Language Models with Susceptibilities

April 25, 2025 | Baker et al.

We develop a linear response framework for interpretability that treats a neural network as a Bayesi...

Programs as Singularities

April 10, 2025 | Murfet and Troiani

We develop a correspondence between the structure of Turing machines and the structure of singularit...

February 2025

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

February 8, 2025 | Lehalleur et al.

In this position paper, we argue that understanding the relation between structure in the data distr...

January 2025

Structure Development in List-Sorting Transformers

January 30, 2025 | Urdshals and Urdshals | ICML SMUNN Workshop

We study how a one-layer attention-only transformer develops relevant structures while learning to s...

Read More arXiv ↗

Dynamics of Transient Structure in In-Context Linear Regression Transformers

January 29, 2025 | Carroll et al.

Modern deep neural networks display striking examples of rich internal computational structure.

October 2024

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

October 4, 2024 | Wang et al. | ICLR | Spotlight

We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity...

June 2024

Timaeus

University of Melbourne

Loss landscape geometry reveals stagewise development of transformers

June 16, 2024 | Wang et al. | ICML HiLD Workshop | Best Paper

The development of the internal structure of neural networks throughout training occurs in tandem wi...

February 2024

Timaeus

University of Melbourne

Estimating the Local Learning Coefficient at Scale

February 6, 2024 | Furman and Lau

The local learning coefficient (LLC) is a principled way of quantifying model complexity, originally...

Read More arXiv ↗

Loss Landscape Degeneracy and Stagewise Development of Transformers

February 4, 2024 | Hoogland et al. | TMLR | Best Paper at 2024 ICML HiLD Workshop

We show that in-context learning emerges in transformers in discrete developmental stages, when they...

October 2023

University of Melbourne

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

October 10, 2023 | Chen et al.

We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theor...

August 2023

The Local Learning Coefficient: A Singularity-Aware Complexity Measure

August 23, 2023 | Lau et al. | AISTATS 2025

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies.

October 2025

Influence Dynamics and Stagewise Data Attribution

October 14, 2025 | Lee et al.

Current training data attribution (TDA) methods treat the influence one sample has on another as sta...

Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory

October 14, 2025 | Urdshals et al.

We study neural network compressibility by using singular learning theory to extend the minimum desc...

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

October 1, 2025 | Adam et al.

We introduce the loss kernel, an interpretability method for measuring similarity between data point...

Read More arXiv ↗

September 2025

Bayesian Influence Functions for Hessian-Free Data Attribution

September 30, 2025 | Kreer et al.

Classical influence functions face significant challenges when applied to deep neural networks, prim...

Read More arXiv ↗

August 2025

Embryology of a Language Model

August 1, 2025 | Wang et al.

Understanding how language models develop their internal computational structure is a central proble...

July 2025

From Global to Local: A Scalable Benchmark for Local Posterior Sampling

July 29, 2025 | Hitchcock and Hoogland

Degeneracy is an inherent feature of the loss landscape of neural networks, but it is not well under...

Read More arXiv ↗

April 2025

Modes of Sequence Models and Learning Coefficients

April 25, 2025 | Chen and Murfet

We develop a geometric account of sequence modelling that links patterns in the data to measurable p...

Read More arXiv ↗

Structural Inference: Interpreting Small Language Models with Susceptibilities

April 25, 2025 | Baker et al.

We develop a linear response framework for interpretability that treats a neural network as a Bayesi...

Programs as Singularities

April 10, 2025 | Murfet and Troiani

We develop a correspondence between the structure of Turing machines and the structure of singularit...

February 2025

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

February 8, 2025 | Lehalleur et al.

In this position paper, we argue that understanding the relation between structure in the data distr...

January 2025

Structure Development in List-Sorting Transformers

January 30, 2025 | Urdshals and Urdshals | ICML SMUNN Workshop

We study how a one-layer attention-only transformer develops relevant structures while learning to s...

Read More arXiv ↗

Dynamics of Transient Structure in In-Context Linear Regression Transformers

January 29, 2025 | Carroll et al.

Modern deep neural networks display striking examples of rich internal computational structure.

October 2024

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

October 4, 2024 | Wang et al. | ICLR | Spotlight

We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity...

June 2024

Timaeus

University of Melbourne

Loss landscape geometry reveals stagewise development of transformers

June 16, 2024 | Wang et al. | ICML HiLD Workshop | Best Paper

The development of the internal structure of neural networks throughout training occurs in tandem wi...

February 2024

Timaeus

University of Melbourne

Estimating the Local Learning Coefficient at Scale

February 6, 2024 | Furman and Lau

The local learning coefficient (LLC) is a principled way of quantifying model complexity, originally...

Read More arXiv ↗

Loss Landscape Degeneracy and Stagewise Development of Transformers

February 4, 2024 | Hoogland et al. | TMLR | Best Paper at 2024 ICML HiLD Workshop

We show that in-context learning emerges in transformers in discrete developmental stages, when they...

October 2023

University of Melbourne

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

October 10, 2023 | Chen et al.

We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theor...

The Local Learning Coefficient: A Singularity-Aware Complexity Measure

August 23, 2023 | Lau et al. | AISTATS 2025

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies.