--- library_name: peft base_model: NousResearch/Meta-Llama-3-70B-Instruct license: apache-2.0 --- # Model Card for radm/Llama-3-70B-Instruct-AH-lora This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto) ## Model Details ### Model Description - **Developed by:** [radm] - **Model type:** [Llama-3-70b] - **Language(s) (NLP):** [English] - **License:** [apache-2.0] - **Finetuned from model [optional]:** [NousResearch/Meta-Llama-3-70B-Instruct] ## Uses Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model. ## Results #### Llama-3-70B-Instruct-GPTQ as judge: ```console Llama-3-Instruct-8B-SimPO | score: 78.3 | 95% CI: (-1.5, 1.2) | average #tokens: 545 SELM-Llama-3-8B-Instruct-iter-3 | score: 72.8 | 95% CI: (-2.1, 1.4) | average #tokens: 606 Meta-Llama-3-8B-Instruct-f16 | score: 65.3 | 95% CI: (-1.8, 2.1) | average #tokens: 560 suzume-llama-3-8B-multilingual-orpo-borda-half | score: 63.5 | 95% CI: (-1.6, 2.1) | average #tokens: 978 Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801 suzume-llama-3-8B-multilingual | score: 48.1 | 95% CI: (-2.2, 1.8) | average #tokens: 767 aya-23-8B | score: 48.0 | 95% CI: (-2.0, 2.1) | average #tokens: 834 Vikhr-7B-instruct_0.5 | score: 19.6 | 95% CI: (-1.3, 1.5) | average #tokens: 794 alpindale_gemma-2b-it | score: 11.2 | 95% CI: (-1.0, 0.8) | average #tokens: 425 ``` #### Llama-3-70B-Instruct-AH-AWQ as judge: ```console Llama-3-Instruct-8B-SimPO | score: 83.8 | 95% CI: (-1.4, 1.3) | average #tokens: 545 SELM-Llama-3-8B-Instruct-iter-3 | score: 78.8 | 95% CI: (-1.7, 1.9) | average #tokens: 606 suzume-llama-3-8B-multilingual-orpo-borda-half | score: 71.8 | 95% CI: (-1.7, 2.4) | average #tokens: 978 Meta-Llama-3-8B-Instruct-f16 | score: 69.8 | 95% CI: (-1.9, 1.7) | average #tokens: 560 suzume-llama-3-8B-multilingual | score: 54.0 | 95% CI: (-2.1, 2.1) | average #tokens: 767 aya-23-8B | score: 50.4 | 95% CI: (-1.7, 1.7) | average #tokens: 834 Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801 Vikhr-7B-instruct_0.5 | score: 14.2 | 95% CI: (-1.3, 1.0) | average #tokens: 794 alpindale_gemma-2b-it | score: 7.9 | 95% CI: (-0.9, 0.8) | average #tokens: 425 ``` ## Training Details ### Training Data Datasets: - radm/arenahard_gpt4vsllama3 - radm/truthy-dpo-v0.1-ru - jondurbin/truthy-dpo-v0.1 #### Training Hyperparameters - **Training regime:** [bf16] - **Load in 4 bit:** [True] - **Target modules:** [all] - **LoRA rank:** [16] - **Max seq length:** [8192] - **Use gradient checkpointing:** [unsloth] - **trainer:** [ORPOTrainer] - **Batch size:** [1] - **Gradient accumulation steps:** [4] - **Epochs:** [1] ### Hardware - **Hardware Type:** [Nvidia A100 80 gb] - **Hours used:** [11 hours] ### Framework versions - PEFT 0.10.0