Instructions to use alabenayed/TounsiLM-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alabenayed/TounsiLM-8b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/aya-expanse-8b")
model = PeftModel.from_pretrained(base_model, "alabenayed/TounsiLM-8b")

Transformers

How to use alabenayed/TounsiLM-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alabenayed/TounsiLM-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("alabenayed/TounsiLM-8b", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use alabenayed/TounsiLM-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alabenayed/TounsiLM-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alabenayed/TounsiLM-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alabenayed/TounsiLM-8b

SGLang

How to use alabenayed/TounsiLM-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alabenayed/TounsiLM-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alabenayed/TounsiLM-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alabenayed/TounsiLM-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alabenayed/TounsiLM-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use alabenayed/TounsiLM-8b with Docker Model Runner:
```
docker model run hf.co/alabenayed/TounsiLM-8b
```

TounsiLM-8b

TounsiLM-8b is a Tunisian Arabic supervised fine-tuning (SFT) adapter built on top of CohereLabs/aya-expanse-8b. It is trained to understand and answer in Tunisian دارجة,answers are direct, on topic and sized to the question — short when brevity is enough, detailed when the topic requires it.

The adapter was fine-tuned on top of a prior CPT checkpoint: alabenayed/improved-aya-expanse-8b-cpt-tunisian, which itself extends the base model with continued pre-training on raw Tunisian dialect text.

Model details

Property	Value
Base model	`CohereLabs/aya-expanse-8b`
CPT checkpoint	`alabenayed/improved-aya-expanse-8b-cpt-tunisian`
Fine-tuning method	PEFT / LoRA SFT adapter
Format	Adapter only — not a merged standalone model

Training details

Property	Value
Dataset	`Syrinesmati/tunisian-question-response-dataset`
Train rows	25,340
Eval rows	6,336
Input fields	`instruction` → user turn, `response` → assistant turn
Trainer	TRL `SFTTrainer`
Epochs	2
Max sequence length	1,024
Learning rate	1e-5
Batch size (per device)	8
Gradient accumulation	4
Effective batch size	32
Precision	bf16

Training metrics

Metric	Value
Training loss	1.1876
Mean token accuracy	0.7578
Training runtime	50,353 seconds (~14 hours)
Total steps	1,584
Total tokens seen	9,585,534

How to use

Load the adapter on the base model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model_name = "CohereLabs/aya-expanse-8b"
adapter_dir = "alabenayed/TounsiLM-8b"  # update with your HF repo path

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_dir)

messages = [
    {"role": "system", "content": "أنت مساعد تونسي تجاوب بالتونسي الدارج فقط."},
    {"role": "user", "content": "شنوة تعمل كان الواحد يحس روحو تعبان؟"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

output_ids = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=False,
    repetition_penalty=1.1,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Recommended inference settings

do_sample=False — more stable, less hallucination
max_new_tokens=128 — keeps answers short and on-topic
repetition_penalty=1.1 — reduces repetitive output

Intended use

Suitable for:

Tunisian Arabic question answering
Chat-style assistant replies in Tunisian دارجة
daily life conversational responses
Translation to/from Tunisian Arabic dialect
Responding to questions asked in other languages, answered in Tunisian Arabic
Medical, legal, religion
General knowledge about Tunisian food, places, history, proverbs ...

Files in this repository

adapter_model.safetensors — fine-tuned LoRA weights
adapter_config.json — LoRA configuration
chat_template.jinja — patched chat template used during training
Tokenizer files
training_metrics.json — full training log history

Framework versions

Library	Version
PEFT	0.19.1
TRL	1.3.0
Transformers	4.57.6
PyTorch	2.11.0
Datasets	4.8.5
Tokenizers	0.22.2

Citation

If you use this model, please cite the base model and the TRL training framework.

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Downloads last month: 126

Model tree for alabenayed/TounsiLM-8b

Base model

CohereLabs/aya-expanse-8b

Adapter

(27)

this model