radm's picture
Update README.md
e209d7d verified
metadata
library_name: peft
base_model: NousResearch/Meta-Llama-3-70B-Instruct
license: apache-2.0

Model Card for radm/Llama-3-70B-Instruct-AH-lora

This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto)

Model Details

Model Description

  • Developed by: [radm]
  • Model type: [Llama-3-70b]
  • Language(s) (NLP): [English]
  • License: [apache-2.0]
  • Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

Uses

Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.

Results

Llama-3-70B-Instruct-GPTQ as judge:

Llama-3-Instruct-8B-SimPO                          | score: 78.3  | 95% CI:   (-1.5, 1.2)   | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3                    | score: 72.8  | 95% CI:   (-2.1, 1.4)   | average #tokens: 606
Meta-Llama-3-8B-Instruct-f16                       | score: 65.3  | 95% CI:   (-1.8, 2.1)   | average #tokens: 560
suzume-llama-3-8B-multilingual-orpo-borda-half     | score: 63.5  | 95% CI:   (-1.6, 2.1)   | average #tokens: 978
Phi-3-medium-128k-instruct                         | score: 50.0  | 95% CI:   (0.0, 0.0)    | average #tokens: 801
suzume-llama-3-8B-multilingual                     | score: 48.1  | 95% CI:   (-2.2, 1.8)   | average #tokens: 767
aya-23-8B                                          | score: 48.0  | 95% CI:   (-2.0, 2.1)   | average #tokens: 834
Vikhr-7B-instruct_0.5                              | score: 19.6  | 95% CI:   (-1.3, 1.5)   | average #tokens: 794
alpindale_gemma-2b-it                              | score: 11.2  | 95% CI:   (-1.0, 0.8)   | average #tokens: 425

Llama-3-70B-Instruct-AH-AWQ as judge:

Llama-3-Instruct-8B-SimPO                          | score: 83.8  | 95% CI:   (-1.4, 1.3)   | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3                    | score: 78.8  | 95% CI:   (-1.7, 1.9)   | average #tokens: 606
suzume-llama-3-8B-multilingual-orpo-borda-half     | score: 71.8  | 95% CI:   (-1.7, 2.4)   | average #tokens: 978
Meta-Llama-3-8B-Instruct-f16                       | score: 69.8  | 95% CI:   (-1.9, 1.7)   | average #tokens: 560
suzume-llama-3-8B-multilingual                     | score: 54.0  | 95% CI:   (-2.1, 2.1)   | average #tokens: 767
aya-23-8B                                          | score: 50.4  | 95% CI:   (-1.7, 1.7)   | average #tokens: 834
Phi-3-medium-128k-instruct                         | score: 50.0  | 95% CI:   (0.0, 0.0)    | average #tokens: 801
Vikhr-7B-instruct_0.5                              | score: 14.2  | 95% CI:   (-1.3, 1.0)   | average #tokens: 794
alpindale_gemma-2b-it                              | score:  7.9  | 95% CI:   (-0.9, 0.8)   | average #tokens: 425

Training Details

Training Data

Datasets:

  • radm/arenahard_gpt4vsllama3
  • radm/truthy-dpo-v0.1-ru
  • jondurbin/truthy-dpo-v0.1

Training Hyperparameters

  • Training regime: [bf16]
  • Load in 4 bit: [True]
  • Target modules: [all]
  • LoRA rank: [16]
  • Max seq length: [8192]
  • Use gradient checkpointing: [unsloth]
  • trainer: [ORPOTrainer]
  • Batch size: [1]
  • Gradient accumulation steps: [4]
  • Epochs: [1]

Hardware

  • Hardware Type: [Nvidia A100 80 gb]
  • Hours used: [11 hours]

Framework versions

  • PEFT 0.10.0