Edit model card

This model is an improved version for Korean, based on the Qwen2-72B-Instruct model.

LogicKor Benchmark (24.07.31)

  • The following benchmark ranks are based on 1-shot evaluation.
    Rank Model Reasoning Math Writing Coding Understanding Grammar Singleturn Multiturn Total Parameters
    1 openai/gpt-4o-2024-05-13 9.21 8.71 9.64 9.78 9.64 9.50 9.33 9.50 9.41 ?
    2 anthropic/claude-3-5-sonnet-20240620 8.64 8.42 9.85 9.78 9.92 9.21 9.26 9.35 9.30 ?
    4 mistralai/Mistral-Large-Instruct-2407 9.71 9.07 9.57 9.92 9.92 6.78 9.19 9.14 9.16 123B
    8 meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 8.78 7.14 9.28 9.64 9.64 8.57 8.97 8.71 8.84 405B
    9 denial07/Qwen2-72B-Instruct-kor-dpo 8.85 8.21 9.14 9.71 9.64 7.21 8.88 8.71 8.79 72B
    10 Qwen/Qwen2-72B-Instruct 8.00 8.14 9.07 9.85 9.78 7.28 8.61 8.76 8.69 72B
    11 google/gemini-1.5-pro-001 7.00 8.00 9.57 8.85 9.35 8.64 8.61 8.52 8.57 ?

KMMLU Benchmark

  • HAERAE-HUB/KMMLU benchmark accuracy score.
    Category Qwen2-72B-it-kor-dpo Qwen2-72B-it Mistral-Large-it-2407 Questions
    HUMSS 0.63 0.63 0.62 5130
    STEM 0.59 0.59 0.57 9900
    Applied Science 0.56 0.56 0.54 11600
    Other 0.58 0.58 0.54 8400
    Overall Accuracy 0.58 0.58 0.56 35030
Downloads last month
205
Safetensors
Model size
72.7B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.