Spaces:
Running
Running
| # -*- coding: utf-8 -*- | |
| """Evaluation suite for SENTINEL oversight architecture. | |
| Modules: | |
| - weak_to_strong: OpenAI-style Weak-to-Strong generalization testing | |
| - transcript_export: METR MALT-style labeled transcript dataset generation | |
| """ | |