Model Card for 4-bit RoLlama3.1-8b-Instruct-DPO
Built from RoLlama3.1-8b-Instruct-DPO, quantized to 4-bit.
This variant of RoLlama3.1-8b-Instruct-DPO provides a reduced footprint through 4-bit quantization, aimed at enabling usage on resource-constrained GPUs while preserving a high fraction of the model’s capabilities.
Model Details
Comparison to 16 bit
It loooks that the effects of the quantization are minimal :
Task | Metric | FP16 Original | 4-bit | Absolute Diff. | % Change |
---|---|---|---|---|---|
ARC Challenge | Avg. Accuracy | 44.84 | 42.74 | -2.10 | -4.68% |
MMLU | Avg. Accuracy | 55.06 | 42.27 | -12.79 | -23.23% |
Winogrande | Avg. Accuracy | 65.87 | 64.94 | -0.93 | -1.41% |
Hellaswag | Avg. Accuracy | 58.67 | 52.39 | -6.28 | -10.70% |
GSM8K | Avg. Accuracy | 44.17 | 38.87 | -5.30 | -11.99% |
TruthfulQA | Avg. Accuracy | 47.82 | 48.67 | +0.85 | +1.78% |
LaRoSeDa (binary) | Macro-F1 | 96.10 | 97.47 | +1.37 | +1.43% |
LaRoSeDa (multiclass) | Macro-F1 | 55.37 | 64.05 | +8.68 | +15.68% |
WMT EN-RO | BLEU | 21.29 | 20.54 | -0.75 | -3.52% |
WMT RO-EN | BLEU | 21.86 | 21.16 | -0.70 | -3.20% |
XQuAD (avg) | EM / F1 | 21.58 / 36.54 | 21.45 / 37.73 | ~-0.13 / +1.19 | -0.60% / +3.26% |
STS (avg) | Spearman / Pearson | 78.01 / 77.98 | 77.08 / 76.93 | -0.93 / -1.05 | -1.19% / -1.35% |
Model Description
- Developed by: OpenLLM-Ro
- Language(s): Romanian
- License: cc-by-nc-4.0
- Quantized from model: RoLlama3.1-8b-Instruct-DPO
- Quantization: 4-bit
Quantization reduces model size and improves inference speed but can lead to small drops in performance. Below is a comprehensive table of the main benchmarks comparing the original full-precision version with the new 4-bit variant.
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
instruction = "Ce jocuri de societate pot juca cu prietenii mei?"
chat = [
{"role": "system", "content": "Ești un asistent folositor, respectuos și onest. Încearcă să ajuți cât mai mult prin informațiile oferite, excluzând răspunsuri toxice, rasiste, sexiste, periculoase și ilegale."},
{"role": "user", "content": instruction},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, system_message="")
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 11
Model tree for OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO-4Bit-BB
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Finetuned
OpenLLM-Ro/RoLlama3.1-8b-Instruct
Dataset used to train OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO-4Bit-BB
Evaluation results
- Average accuracy on OpenLLM-Ro/ro_arc_challengeself-reported42.740
- 0-shot on OpenLLM-Ro/ro_arc_challengeself-reported40.790
- 1-shot on OpenLLM-Ro/ro_arc_challengeself-reported40.360
- 3-shot on OpenLLM-Ro/ro_arc_challengeself-reported43.360
- 5-shot on OpenLLM-Ro/ro_arc_challengeself-reported44.040
- 10-shot on OpenLLM-Ro/ro_arc_challengeself-reported43.870
- 25-shot on OpenLLM-Ro/ro_arc_challengeself-reported44.040
- Average accuracy on OpenLLM-Ro/ro_mmluself-reported42.270
- 0-shot on OpenLLM-Ro/ro_mmluself-reported43.230
- 1-shot on OpenLLM-Ro/ro_mmluself-reported42.470