--- license: other license_name: qwen license_link: LICENSE model-index: - name: Qwen-72B-Llama results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 64.85 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 83.27 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 73.66 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 57.6 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 81.53 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 56.25 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Qwen-72B-Llama name: Open LLM Leaderboard --- # 🦙 Qwen-72B-Llama This is the 🦙 llamafied version of [Qwen/Qwen-72B](https://huggingface.co/Qwen/Qwen-72B). ## 🛠️ Reproduction I used [this script](https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py) to convert the weights: [LLaMA-Factory/tests/llamafy_qwen.py](https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py) ## 🔠 Tokenizer After I converted the weights, I took the tokenizer from [KnutJaegersberg/Qwen-14B-Llamafied](https://huggingface.co/KnutJaegersberg/Qwen-14B-Llamafied) and uploaded it to this repository. ## 📊 Eval Scores Compared to Original Model Here are some of the evaluation score comparisons based on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). | Metric | Qwen-72B | **Qwen-72B-Llama** | |-----------------------|---------------|--------------------| | Avg. | 73.6 | **69.53** | | ARC (25-shot) | 65.19 | **64.85** | | HellaSwag (10-shot) | 85.94 | **83.27** | | MMLU (5-shot) | 77.37 | **73.66** | | TruthfulQA (0-shot) | 60.19 | **57.6** | | Winogrande (5-shot) | 82.48 | **81.53** | | GSM8K (5-shot) | 70.43 | **56.25** | ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/hRQRMYVPc4LyavE3GaI_T.png) # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Qwen-72B-Llama) | Metric |Value| |---------------------------------|----:| |Avg. |69.53| |AI2 Reasoning Challenge (25-Shot)|64.85| |HellaSwag (10-Shot) |83.27| |MMLU (5-Shot) |73.66| |TruthfulQA (0-shot) |57.60| |Winogrande (5-shot) |81.53| |GSM8k (5-shot) |56.25|