README.md · denial07/Qwen2-72B-Instruct-kor-dpo at main

metadata

license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/resolve/main/LICENSE

This model is an improved version for Korean, based on the Qwen2-72B-Instruct model.

The following benchmark ranks are based on 1-shot evaluation.

Rank	Model	Reasoning	Math	Writing	Coding	Understanding	Grammar	Singleturn	Multiturn	Total	Parameters
1	openai/gpt-4o-2024-05-13	9.21	8.71	9.64	9.78	9.64	9.50	9.33	9.50	9.41	?
2	anthropic/claude-3-5-sonnet-20240620	8.64	8.42	9.85	9.78	9.92	9.21	9.26	9.35	9.30	?
4	mistralai/Mistral-Large-Instruct-2407	9.71	9.07	9.57	9.92	9.92	6.78	9.19	9.14	9.16	123B
8	meta-llama/Meta-Llama-3.1-405B-Instruct-FP8	8.78	7.14	9.28	9.64	9.64	8.57	8.97	8.71	8.84	405B
9	`denial07/Qwen2-72B-Instruct-kor-dpo`	8.85	8.21	9.14	9.71	9.64	7.21	8.88	8.71	8.79	72B
10	Qwen/Qwen2-72B-Instruct	8.00	8.14	9.07	9.85	9.78	7.28	8.61	8.76	8.69	72B
11	google/gemini-1.5-pro-001	7.00	8.00	9.57	8.85	9.35	8.64	8.61	8.52	8.57	?

HAERAE-HUB/KMMLU benchmark accuracy score.

Category	Qwen2-72B-it-kor-dpo	Qwen2-72B-it	Mistral-Large-it-2407	Questions
HUMSS	0.63	0.63	0.62	5130
STEM	0.59	0.59	0.57	9900
Applied Science	0.56	0.56	0.54	11600
Other	0.58	0.58	0.54	8400
Overall Accuracy	0.58	0.58	0.56	35030