Update on GitHub


This metric is used to assess performance on the Mathematics Aptitude Test of Heuristics (MATH) dataset. It first canonicalizes the inputs (e.g., converting "1/2" to "\\frac{1}{2}") and then computes accuracy.

How to load this metric directly with the datasets library:

from datasets import load_metric
metric = load_metric("competition_math")


  title={Measuring Mathematical Problem Solving With the MATH Dataset},
  author={Dan Hendrycks
    and Collin Burns
    and Saurav Kadavath
    and Akul Arora
    and Steven Basart
    and Eric Tang
    and Dawn Song
    and Jacob Steinhardt},
  journal={arXiv preprint arXiv:2103.03874},