Edit model card

Llama 3 8B finetuned on mlabonne/orpo-dpo-mix-40k with ORPO.
Max length was reduced to 1024 tokens. LoRA (r=16) and 4bit quantization was used to increase memory efficiency.

Benchmark LLaMa 3 8B LLaMa 3 8B Inst LLaMa 3 8B ORPO V1 LLaMa 3 8B ORPO V2 (WIP)
MMLU 62.12 63.92 61.87
BoolQ 81.04 83.21 82.42
Winogrande 73.24 72.06 74.43
ARC-Challenge 53.24 56.91 52.90
TriviaQA 63.33 51.09 63.93
GSM-8K (flexible) 50.27 75.13 52.16
SQuAD V2 (f1) 32.48 29.68 33.68
LogiQA 29.23 32.87 30.26
All scores obtained with lm-evaluation-harness v0.4.2
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
FP16
·