denial07's picture
Update README.md
bf46dfe verified
metadata
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/resolve/main/LICENSE

This model is an improved version for Korean, based on the Qwen2-72B-Instruct model.

LogicKor Benchmark (24.07.31)

  • The following benchmark ranks are based on 1-shot evaluation.
    Rank Model Reasoning Math Writing Coding Understanding Grammar Singleturn Multiturn Total Parameters
    1 openai/gpt-4o-2024-05-13 9.21 8.71 9.64 9.78 9.64 9.50 9.33 9.50 9.41 ?
    2 anthropic/claude-3-5-sonnet-20240620 8.64 8.42 9.85 9.78 9.92 9.21 9.26 9.35 9.30 ?
    4 mistralai/Mistral-Large-Instruct-2407 9.71 9.07 9.57 9.92 9.92 6.78 9.19 9.14 9.16 123B
    8 meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 8.78 7.14 9.28 9.64 9.64 8.57 8.97 8.71 8.84 405B
    9 denial07/Qwen2-72B-Instruct-kor-dpo 8.85 8.21 9.14 9.71 9.64 7.21 8.88 8.71 8.79 72B
    10 Qwen/Qwen2-72B-Instruct 8.00 8.14 9.07 9.85 9.78 7.28 8.61 8.76 8.69 72B
    11 google/gemini-1.5-pro-001 7.00 8.00 9.57 8.85 9.35 8.64 8.61 8.52 8.57 ?

KMMLU Benchmark

  • HAERAE-HUB/KMMLU benchmark accuracy score.
    Category Qwen2-72B-it-kor-dpo Qwen2-72B-it Mistral-Large-it-2407 Questions
    HUMSS 0.63 0.63 0.62 5130
    STEM 0.59 0.59 0.57 9900
    Applied Science 0.56 0.56 0.54 11600
    Other 0.58 0.58 0.54 8400
    Overall Accuracy 0.58 0.58 0.56 35030