Caracal_instruct

Model Description

Caracal_instruct is an instruction-tuned model, produced as part of the AfriLLMQuant pilot project. It was trained via full Quantization-Aware Training (QAT).

The underlying small language model (SLM) backbone for this QAT process was Inkuba-0.4B, continued-pretrained on African-language data and then instruction-tuned to produce Caracal_instruct. The recommended use of this model is for fine-tuning on specific task.

Base model theophilusowiti/Caracal_GPT
Training method Full Quantization-Aware Training (QAT)
Quantization INT4
Memory footprint (INT8) ~1.17 GB
QAT training time ~3 days on 1x NVIDIA A100 80GB
License Apache 2.0
Training Dataset Muri Dataset

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "theophilusowiti/Caracal_instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "<s><Input>\nWho is the president of Kenya?\n</Input>\n<Answer>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Expected output

The model expects the <Input>...</Input> / <Answer>...</Answer> instruction format used during SFT, e.g.:

<Input>
Who is the president of Kenya?
</Input>
<Answer>
 William Ruto

William Ruto</Answer>

Top Performing Languages

Based on instruction following and PPL, these languages conform to instruction, especially when used with context/RAG:

East Africa: Swahili (swa), Amharic (amh), Luganda (lug), Kinyarwanda (kin)

West Africa: Hausa (hau), Yoruba (yor), Igbo (ibo)

Central Africa: Lingala (lin)

Southern Africa: Xhosa (xho)

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
3.1297 0.9999 6,036 2.6244
2.7599 2.0 12,073 2.6072
2.7805 2.9998 18,108 2.6143

Framework versions

  • Transformers 4.45.0
  • Pytorch 2.12.0+cu126
  • Datasets 4.8.5
  • Tokenizers 0.20.3

Downloads last month
1,499
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theophilusowiti/Caracal_instruct

Finetuned
(2)
this model

Paper for theophilusowiti/Caracal_instruct

Evaluation results