eval: submit GPQA Diamond result via PR (community badge)

#1
by terry-u - opened

Self-reported placeholder (value 0.01, not a measured eval). Opens as community-provided.

terry-u changed pull request status to merged

Sign up or log in to comment