Mechanistic Interpretability Benchmark

university
Activity Feed

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

mech-interp-bench's activity

atticusg 
authored 12 papers 7 months ago