REFUTE v1.0: science epistemics benchmark — skill, calibration, forced-choice, soundness.
BGPT
BGPT-OFFICIAL
AI & ML interests
None yet
Recent Activity
new activity about 18 hours ago
BGPT-OFFICIAL/refute:Call for stress tests: try to break REFUTE (Hard-60 first) new activity about 18 hours ago
BGPT-OFFICIAL/refute:Essay: the epistemics of REFUTE — falsification, calibration, and why skill ≠ truth new activity about 19 hours ago
BGPT-OFFICIAL/refute:🚀 REFUTE v1.0 — GPT-5.4 wins skill, loses calibration (15 frontier models)Organizations
None yet