MMLU of ChatGPT/GPT3.5-turbo is 69~70, GSM8K 78.2

by JosephusCheung - opened Nov 2, 2023

Nov 2, 2023

•

edited Nov 2, 2023

See MMLU 69.1 GSM8K 78.2
on https://opencompass.org.cn/leaderboard-llm updated:2023/9/1, and MMLU scoring 70 from other sources.

JosephusCheung changed discussion title from MMLU of ChatGPT/GPT3.5-turbo is 69~70 to MMLU of ChatGPT/GPT3.5-turbo is 69~70, GSM8K 78.2 Nov 2, 2023

imone

OpenChat org Nov 2, 2023

Our MMLU and GSM8k results come from Chain-of-Thought Hub

We use the same prompts and answer matching as Chain-of-Thought Hub, so the comparison should be fair.

JosephusCheung

Nov 2, 2023

Model	# Params	Average	MT-Bench	AGIEval	BBH MC	TruthfulQA	MMLU	HumanEval	BBH CoT	GSM8K
OpenChat-3.5	7B	61.6	7.81	47.4	47.6	59.1	64.3	55.5	63.5	77.3
ChatGPT (Yours)	?	61.5	7.94	47.1	47.6	57.7	67.3	48.1	70.1	74.9
ChatGPT (Other Sources*)	?	65.3	7.94	47.1	47.6	57.7	69.1*	73.2*	70.1	78.2*

imone

OpenChat org Nov 2, 2023

Thank you for your interest in our results. As you've rightly pointed out, the performance of ChatGPT has evolved over time, and there are numerous reports from different time periods. For a clearer comparison, our reported results are based on the data available around March, which we label as ChatGPT (March), sourced from Chain-of-Thought Hub and OpenAI's technical report.

imone changed discussion status to closed Nov 2, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment