Benchmarks?

by rombodawg - opened Jan 6

Discussion

rombodawg

Jan 6

Can we get this submitted to open llm leaderboard? A humaneval score would be nice too

jondurbin

Owner Jan 7

Looks like someone submitted it to the leaderboard. I can run some additional benchmarks once the DPO version finishes, to compare both. It seems there's some sort of issue with the model's performance on gsm8k however.

jondurbin changed discussion status to closed Jan 16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment