Developmental Interpretability
Developmental interpretability is an AI alignment research agenda studying how structure forms in neural networks.
-
Devinterp→
Learn more about developmental interpretability and its applications to AI safety.
-
Conferences→
Check out the 2023 SLT & alignment conferences for the necessary background and plan.
Developmental Interpretability
2023 SLT & alignment summit
The first SLT & alignment summit ("Singularities against the singularity") was run in June 2023. In the first week, we recorded more than 20 hours of lectures on the necessary background, all of which you can find here. In the second week, we started research collaborations on a dozen open problems.
A second summit is planned for November 2023. Stay tuned for more details.
Community
Join our discord to discuss ask questions, find collabators, and stay up to date on the latest developments.
JOIN