gsm8k lm-eval-harness accuracy: 34.34