denial07's picture
Update README.md
385d9c9 verified
|
raw
history blame
817 Bytes
metadata
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/resolve/main/LICENSE

LogicKor Benchmark (24.07.31)

Rank (1-shot) Model Reasoning Math Writing Coding Understanding Grammar Singleturn Multiturn Total
1 openai/gpt-4o-2024-05-13 9.21 8.71 9.64 9.78 9.64 9.50 9.33 9.50 9.41
2 anthropic/claude-3-5-sonnet-20240620 8.64 8.42 9.85 9.78 9.92 9.21 9.26 9.35 9.30
7 denial07/Qwen2-72B-Instruct-kor-dpo 8.85 8.21 9.14 9.71 9.64 7.21 8.88 8.71 8.79
8 Qwen/Qwen2-72B-Instruct 8.00 8.14 9.07 9.85 9.78 7.28 8.61 8.76 8.69