Instructions to use theophilusowiti/Caracal_instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use theophilusowiti/Caracal_instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="theophilusowiti/Caracal_instruct", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("theophilusowiti/Caracal_instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("theophilusowiti/Caracal_instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use theophilusowiti/Caracal_instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "theophilusowiti/Caracal_instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theophilusowiti/Caracal_instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/theophilusowiti/Caracal_instruct
- SGLang
How to use theophilusowiti/Caracal_instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "theophilusowiti/Caracal_instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theophilusowiti/Caracal_instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "theophilusowiti/Caracal_instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theophilusowiti/Caracal_instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use theophilusowiti/Caracal_instruct with Docker Model Runner:
docker model run hf.co/theophilusowiti/Caracal_instruct
Caracal_instruct
Model Description
Caracal_instruct is an instruction-tuned model, produced as part of the AfriLLMQuant pilot project. It was trained via full Quantization-Aware Training (QAT).
The underlying small language model (SLM) backbone for this QAT process was Inkuba-0.4B, continued-pretrained on African-language data and then instruction-tuned to produce Caracal_instruct. The recommended use of this model is for fine-tuning on specific task.
| Base model | theophilusowiti/Caracal_GPT |
| Training method | Full Quantization-Aware Training (QAT) |
| Quantization | INT4 |
| Memory footprint (INT8) | ~1.17 GB |
| QAT training time | ~3 days on 1x NVIDIA A100 80GB |
| License | Apache 2.0 |
| Training Dataset | Muri Dataset |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "theophilusowiti/Caracal_instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "<s><Input>\nWho is the president of Kenya?\n</Input>\n<Answer>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Expected output
The model expects the <Input>...</Input> / <Answer>...</Answer> instruction format used during SFT, e.g.:
<Input>
Who is the president of Kenya?
</Input>
<Answer>
William Ruto
William Ruto</Answer>
Top Performing Languages
Based on instruction following and PPL, these languages conform to instruction, especially when used with context/RAG:
East Africa: Swahili (swa), Amharic (amh), Luganda (lug), Kinyarwanda (kin)
West Africa: Hausa (hau), Yoruba (yor), Igbo (ibo)
Central Africa: Lingala (lin)
Southern Africa: Xhosa (xho)
Training Procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 3.1297 | 0.9999 | 6,036 | 2.6244 |
| 2.7599 | 2.0 | 12,073 | 2.6072 |
| 2.7805 | 2.9998 | 18,108 | 2.6143 |
Framework versions
- Transformers 4.45.0
- Pytorch 2.12.0+cu126
- Datasets 4.8.5
- Tokenizers 0.20.3
- Downloads last month
- 1,499
Model tree for theophilusowiti/Caracal_instruct
Paper for theophilusowiti/Caracal_instruct
Evaluation results
- Accuracy on IrokoBench AfriXNLIself-reported33.320