Qwen-72B-Llama / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
c023337 verified
|
raw
history blame
4.78 kB
---
license: other
license_name: qwen
license_link: LICENSE
model-index:
- name: Qwen-72B-Llama
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 64.85
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 83.27
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 73.66
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 57.6
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 81.53
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 56.25
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama
name: Open LLM Leaderboard
---
# 🦙 Qwen-72B-Llama
This is the 🦙 llamafied version of [Qwen/Qwen-72B](https://huggingface.co/Qwen/Qwen-72B).
## 🛠️ Reproduction
I used [this script](https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py) to convert the weights:
[LLaMA-Factory/tests/llamafy_qwen.py](https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py)
## 🔠 Tokenizer
After I converted the weights, I took the tokenizer from [KnutJaegersberg/Qwen-14B-Llamafied](https://huggingface.co/KnutJaegersberg/Qwen-14B-Llamafied) and uploaded it to this repository.
## 📊 Eval Scores Compared to Original Model
Here are some of the evaluation score comparisons based on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
| Metric | Qwen-72B | **Qwen-72B-Llama** |
|-----------------------|---------------|--------------------|
| Avg. | 73.6 | **69.53** |
| ARC (25-shot) | 65.19 | **64.85** |
| HellaSwag (10-shot) | 85.94 | **83.27** |
| MMLU (5-shot) | 77.37 | **73.66** |
| TruthfulQA (0-shot) | 60.19 | **57.6** |
| Winogrande (5-shot) | 82.48 | **81.53** |
| GSM8K (5-shot) | 70.43 | **56.25** |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/hRQRMYVPc4LyavE3GaI_T.png)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Qwen-72B-Llama)
| Metric |Value|
|---------------------------------|----:|
|Avg. |69.53|
|AI2 Reasoning Challenge (25-Shot)|64.85|
|HellaSwag (10-Shot) |83.27|
|MMLU (5-Shot) |73.66|
|TruthfulQA (0-shot) |57.60|
|Winogrande (5-shot) |81.53|
|GSM8k (5-shot) |56.25|