Probed Vision Encoders with linear classifiers and trained patch-level SAEs to discover interpretable sparse dictionary features (SDFs) defining demographic traits, revealing the heavy influence of reconstruction error in SAE-based debiasing.
// May 11, 2025