We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger ๐ช
Together with the models, we are releasing:
๐CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
๐ IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi