The Loss Kernel: A Geometric Probe for Deep Learning Interpretability
Maxwell Adam
Timaeus & University of Melbourne
Zach Furman
University of Melbourne
Jesse Hoogland
Timaeus
September 30, 2025
Abstract
We introduce the loss kernel, an interpretability method for measuring similarity between data points according to a trained neural network. The kernel is the covariance matrix of per-sample losses computed under a distribution of low-loss-preserving parameter perturbations. We first validate our method on a synthetic multitask problem, showing it separates inputs by task as predicted by theory. We then apply this kernel to Inception-v1 to visualize the structure of ImageNet, and we show that the kernel's structure aligns with the WordNet semantic hierarchy. This establishes the loss kernel as a practical tool for interpretability and data attribution.