Instructions to use OpenLLM-Ro/RoLlama-3.2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use OpenLLM-Ro/RoLlama-3.2-1B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3.2-1b-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "OpenLLM-Ro/RoLlama-3.2-1B") - Notebooks
- Google Colab
- Kaggle
Model Card for RoLlama-3.2-1B
RoLlama-3.2-1B is a continually-pretrained adaptation of meta-llama/Llama-3.2-1B for the Romanian language, produced under a constrained compute budget (single RTX 3090). It was trained with QLoRA on 2.4B Romanian tokens (FineWeb2-Edu-Ro, quality-filtered) mixed with English (80/20) to limit catastrophic forgetting.
Designed for 4-bit (QLoRA) inference under constrained VRAM. The adapter was trained on a 4-bit (nf4) base and is intended to run in that configuration. Reported metrics use 4-bit inference — its deployment precision.
Model Details
Model Description
- Developed by: OpenLLM-Ro
- Language(s): Romanian (ro), English (en)
- License: Llama 3.2 Community License
- Continually pretrained from:
meta-llama/Llama-3.2-1B - Training corpus: FineWeb2-Edu-Ro (educational quality score = 4 subset) + FineWeb (EN), 80/20 mix, 2.4B tokens
- Method: QLoRA (4-bit), LoRA rank 64 incl.
embed_tokens+lm_head, WSD scheduler, no sequence packing
Model Sources
- Repository: https://github.com/pariidanDKE/TrainingRoLlama3.2-1B
- Write-up: https://dan1180627.substack.com/p/rollama32-1b-cpt-of-a-small-language
Intended Use
Intended Use Cases
Research on Romanian language adaptation and continual pretraining of small models. This is a base model (not instruction-tuned); it is intended for further fine-tuning (SFT/DPO) or text-completion research, not for direct chat/instruction following.
Out-of-Scope Use
Any use that violates the Llama 3.2 license or applicable laws; use in languages other than Romanian/English; deployment as an assistant without further alignment.
How to Get Started with the Model
This is a LoRA adapter meant to be loaded on the base model in 4-bit (its training/deployment config):
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base_model = "meta-llama/Llama-3.2-1B"
adapter = "OpenLLM-Ro/RoLlama-3.2-1B"
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base_model, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
inputs = tok("Capitala României este", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=40)
print(tok.decode(out[0], skip_special_tokens=True))
Academic Benchmarks
Accuracy (%), 4-bit inference, averaged across the standard few-shot settings. Higher is better. Average is the mean over all six tasks.
| Model | Average | ARC | MMLU | Winogrande | HellaSwag | GSM8k | TruthfulQA |
|---|---|---|---|---|---|---|---|
| Llama-3.2-1B (base) | 31.25 | 29.45 | 24.50 | 51.82 | 35.75 | 1.21 | 44.76 |
| RoLlama-3.2-1B | 32.55 | 31.33 | 23.59 | 54.14 | 40.21 | 0.18 | 45.82 |
ARC/HellaSwag report acc_norm; MMLU/Winogrande/TruthfulQA report acc; GSM8k reports exact_match. Scores from the final 2.4B-token checkpoint of the FineWeb2-Edu-Ro run, 4-bit inference (deployment precision).
Downstream Tasks
LaRoSeDa (sentiment) — Macro F1
| Model | Binary (Few-shot) | Binary (Finetuned) | Multiclass (Few-shot) | Multiclass (Finetuned) |
|---|---|---|---|---|
| Llama-3.2-1B (base) | 50.84 | - | 33.42 | - |
| RoLlama-3.2-1B | 67.54 | - | 25.63 | - |
WMT (translation) — BLEU
| Model | EN→RO (Few-shot) | EN→RO (Finetuned) | RO→EN (Few-shot) | RO→EN (Finetuned) |
|---|---|---|---|---|
| Llama-3.2-1B (base) | 6.11 | - | 15.60 | - |
| RoLlama-3.2-1B | 2.10 | - | 2.94 | - |
XQuAD (extractive QA)
| Model | EM (Few-shot) | F1 (Few-shot) | EM (Finetuned) | F1 (Finetuned) |
|---|---|---|---|---|
| Llama-3.2-1B (base) | 20.88 | 31.11 | - | - |
| RoLlama-3.2-1B | 13.51 | 25.21 | - | - |
STS (semantic textual similarity)
| Model | Spearman (Few-shot) | Pearson (Few-shot) | Spearman (Finetuned) | Pearson (Finetuned) |
|---|---|---|---|---|
| Llama-3.2-1B (base) | 0.019 | 0.018 | - | - |
| RoLlama-3.2-1B | -0.004 | -0.005 | - | - |
Additional Romanian Signals (supplementary, not in standard suite)
| Metric | Llama-3.2-1B (base) | RoLlama-3.2-1B | Note |
|---|---|---|---|
| RoWiki perplexity ↓ | 60.44 | 32.47 | primary Romanian fluency signal |
| RO Belebele (acc_norm) | 26.47 | 27.22 | reading comprehension |
| RO Grammar (acc) | 28.37 | 27.74 |
English Retention (catastrophic forgetting check)
| Metric | Llama-3.2-1B (base) | RoLlama-3.2-1B |
|---|---|---|
| WikiText perplexity ↓ | 12.35 | 14.87 |
| ARC-Challenge (acc_norm) | 34.64 | 33.70 |
| Winogrande (acc) | 61.25 | 59.67 |
Training Recipe (summary)
| Hyperparameter | Value |
|---|---|
| LoRA rank / alpha | 64 / 64 |
| Target modules | attn + MLP + embed_tokens + lm_head |
| Learning rate / embedding LR | 1e-4 / 2e-5 |
| Effective batch size | 128 (BS 4 × GA 32) |
| Data mix | 80% RO (FineWeb2-Edu-Ro) / 20% EN |
| Sequence packing | disabled |
| Scheduler | warmup_stable_decay (6% warmup, 5% decay) |
| Precision | BF16 + QLoRA 4-bit + FlashAttention 2 + grad checkpointing |
| Tokens | 2.4B (8 milestones × 300M) |
Citation
@misc{parii2026rollama32,
title = {RoLlama-3.2-1B: Continual Pretraining of a Small Language Model for Romanian under Compute Constraints},
author = {Parii, Dan},
year = {2026},
howpublished = {\url{https://dan1180627.substack.com/p/rollama32-1b-cpt-of-a-small-language}}
}
- Downloads last month
- 54
Model tree for OpenLLM-Ro/RoLlama-3.2-1B
Base model
meta-llama/Llama-3.2-1B