SinLlama Llama-3-8B Merged

This repository contains a continuous pre-trained (CPT) base model for the Sinhala language. It was created by mathematically merging the official meta-llama/Meta-Llama-3-8B base model with the polyglots/SinLlama_v01 LoRA adapter.

This model has been fully merged into standalone FP16 weights, meaning it can be loaded directly in libraries like transformers or vLLM without needing to download or manage separate Peft adapters.

🏆 Acknowledgments, Credits & Disclaimer

My contribution to this repository is strictly limited to merging the weights to provide a convenient standalone model.

All credit for the foundational machine learning research, tokenizer vocabulary expansion, dataset curation, and continuous pre-training (CPT) belongs entirely to the Polyglots team and the authors of the SinLlama paper.

Researchers/Authors: H. W. K. Aravinda, Rashad Sirajudeen, Samith Karunathilake, Nisansa de Silva, Surangika Ranathunga, Rishemjit Kaur
Original Adapter: polyglots/SinLlama_v01
Original Tokenizer: polyglots/Extended-Sinhala-LLaMA
Paper: SinLlama: A Large Language Model for Sinhala

If you use this model in your research or applications, please ensure you cite their original work:

@article{aravinda2025sinllama,
  title={SinLlama-A Large Language Model for Sinhala},
  author={Aravinda, H W K and Sirajudeen, Rashad and Karunathilake, Samith and de Silva, Nisansa and Ranathunga, Surangika and Kaur, Rishemjit},
  journal={arXiv preprint arXiv:2508.09115},
  year={2025}
}

⚙️ Model Details

Base Model: meta-llama/Meta-Llama-3-8B
Language: Sinhala (si), English (en)
Architecture: Llama 3 (8 Billion Parameters)
Format: Safetensors (Unquantized FP16)

⚠️ Important Limitations (Base Model vs. Chat Model)

This is a Base Model, not an Instruction-Tuned (Chat) Model. Because it has not undergone Supervised Fine-Tuning (SFT), it is optimized for text completion, not interactive conversation.

If you prompt it with a question, it may attempt to complete the document by writing more questions (acting like an FAQ page). To use it effectively for Q&A, you must format your prompts strictly and use stopping criteria (like regex or EOS tokens) to prevent looping and run-on generations.

Example Prompt Format:

Question: කෘතිම බුද්ධිය (AI) යනු කුමක්දැයි සරලව පැහැදිලි කරන්න.
Answer:

💻 How to Load in 4-bit (Google Colab / T4 GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained("SAWithanage/SinLlama-Llama-3-8B-Merged")
model = AutoModelForCausalLM.from_pretrained(
    "SAWithanage/SinLlama-Llama-3-8B-Merged",
    quantization_config=bnb_config,
    device_map="auto"
)