Instructions to use danp27/RoLLama-3.2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use danp27/RoLLama-3.2-1B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3.2-1b-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "danp27/RoLLama-3.2-1B") - Notebooks
- Google Colab
- Kaggle
RoLlama-3.2-1B-ro-cpt-filtered
This adapter is part of the RoLLaMA-3.2-1B project, an effort to adapt a small 1B Llama model to Romanian through compute-constrained continual pretraining on a single consumer GPU. The full training recipe, ablations, and benchmark analysis are documented in the accompanying post: RoLLaMA-3.2-1B: CPT of a Small Language Model for Romanian.
LoRA adapter for Romanian continual pretraining of unsloth/llama-3.2-1b-unsloth-bnb-4bit.
This adapter was exported from training checkpoint checkpoint-16124 and is intended to be loaded on top of the base model with PEFT.
Load
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "unsloth/llama-3.2-1b-unsloth-bnb-4bit"
adapter_repo = "danp27/RoLlama-3.2-1B-ro-cpt-filtered"
tokenizer = AutoTokenizer.from_pretrained(adapter_repo)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_repo)
Notes
- Romanian continual pretraining adapter
- Trained with LoRA rank 64 and
embed_tokens/lm_headsaved as modules - Publish only the adapter artifacts, not optimizer or trainer state
Results
Final evaluation from the filtered 2.4B-token run, compared against the base checkpoint:
| Metric | Base | Filtered CPT | Delta |
|---|---|---|---|
| RoHellaSwag | 35.75 | 40.21 | +4.47 |
| RoWinoGrande | 51.82 | 54.14 | +2.33 |
| RoARC Challenge | 29.45 | 31.33 | +1.89 |
| RoMMLU | 24.50 | 23.59 | -0.91 |
| RoWiki word perplexity | 60.44 | 32.47 | -27.98 |
English-side retention for the same checkpoint:
| Metric | Base | Filtered CPT | Delta |
|---|---|---|---|
| WikiText word perplexity | 12.35 | 14.87 | +2.52 |
| Winogrande | 61.25 | 59.67 | -1.58 |
| ARC Challenge (norm) | 34.64 | 33.70 | -0.94 |
These numbers reflect the final filtered run only, not the unfiltered comparison run.
References
- Dan Parii. RoLLaMA-3.2-1B: CPT of a Small Language Model for Romanian. Substack.
https://dan1180627.substack.com/p/rollama32-1b-cpt-of-a-small-language
- Downloads last month
- 17