Instructions to use AyoubChLin/lfm2.5-8b-saudi-dialect with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AyoubChLin/lfm2.5-8b-saudi-dialect with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AyoubChLin/lfm2.5-8b-saudi-dialect") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AyoubChLin/lfm2.5-8b-saudi-dialect") model = AutoModelForCausalLM.from_pretrained("AyoubChLin/lfm2.5-8b-saudi-dialect") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AyoubChLin/lfm2.5-8b-saudi-dialect with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AyoubChLin/lfm2.5-8b-saudi-dialect" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AyoubChLin/lfm2.5-8b-saudi-dialect", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AyoubChLin/lfm2.5-8b-saudi-dialect
- SGLang
How to use AyoubChLin/lfm2.5-8b-saudi-dialect with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AyoubChLin/lfm2.5-8b-saudi-dialect" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AyoubChLin/lfm2.5-8b-saudi-dialect", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AyoubChLin/lfm2.5-8b-saudi-dialect" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AyoubChLin/lfm2.5-8b-saudi-dialect", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AyoubChLin/lfm2.5-8b-saudi-dialect with Docker Model Runner:
docker model run hf.co/AyoubChLin/lfm2.5-8b-saudi-dialect
LFM2.5-8B Saudi Dialect
AyoubChLin/lfm2.5-8b-saudi-dialect is a Saudi Arabic conversational fine-tune of LiquidAI/LFM2.5-8B-A1B.
The model was fine-tuned to produce more natural Saudi dialect responses in chat-style conversations. It is intended for Arabic dialogue, informal Saudi phrasing, and assistant-style responses using a Saudi Arabic system prompt.
Model Details
| Field | Value |
|---|---|
| Base model | LiquidAI/LFM2.5-8B-A1B |
| Fine-tuned model | AyoubChLin/lfm2.5-8b-saudi-dialect |
| Dataset | HeshamHaroon/saudi-dialect-conversations |
| Dataset size | 3,545 examples |
| Train split | 3,474 examples |
| Evaluation split | 71 examples |
| Fine-tuning method | Supervised fine-tuning with LoRA |
| Final format | Merged model |
| Precision | bf16 |
| Quantization | None |
| Max sequence length | 10,244 tokens |
| Language | Arabic |
| Dialect focus | Saudi Arabic |
| License | Apache 2.0 |
Intended Use
This model is intended for Saudi Arabic conversational use cases, including:
- Saudi dialect chatbots
- Arabic assistant responses with Saudi phrasing
- Dialogue generation
- Informal Saudi Arabic conversation
- Domain-specific Saudi Arabic assistant prototypes
Example system prompt used during fine-tuning:
أنت مساعد مفيد يتحدث باللهجة السعودية.
Dataset
The model was fine-tuned on:
HeshamHaroon/saudi-dialect-conversations
The dataset contains multi-turn Saudi Arabic conversations with metadata such as scenario, topic, complexity, and English summary. During preprocessing, each conversation was rendered with the model chat template. A Saudi Arabic system message was injected when missing.
Example conversational style includes casual Saudi phrases such as:
هلا والله
وش سالفتك؟
ايه والله
الله يعطيك العافية
Training Setup
The model was trained with supervised fine-tuning using LoRA adapters. The base model was loaded in bf16 without 4-bit quantization, and Flash Attention 2 was enabled.
LoRA Configuration
| Parameter | Value |
|---|---|
| LoRA rank | 128 |
| LoRA alpha | 254 |
| LoRA dropout | 0.05 |
| Bias | none |
| Task type | Causal LM |
| Trainable parameters | 38,535,168 |
| Total parameters | 8,506,391,296 |
| Trainable percentage | 0.4530% |
Target Modules
LoRA was applied to the following modules:
q_proj
k_proj
v_proj
out_proj
in_proj
conv.in_proj
conv.out_proj
gate_proj
up_proj
down_proj
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 6 |
| Per-device train batch size | 8 |
| Per-device eval batch size | 8 |
| Gradient accumulation steps | 8 |
| Effective batch size | 64 |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Optimizer | adamw_torch_fused |
| Precision | bf16 |
| FP16 | false |
| Max sequence length | 10,244 |
| Evaluation strategy | steps |
| Eval steps | 70 |
| Save steps | 70 |
| Save total limit | 2 |
| Logging steps | 10 |
| Dataset packing | false |
| Dataloader workers | 4 |
| Seed | 42 |
| Flash Attention 2 | enabled |
| Gradient checkpointing | disabled |
| Quantization | none |
Training Environment
| Component | Value |
|---|---|
| GPU | NVIDIA H200 |
| VRAM | 150.1 GB |
| PyTorch | 2.8.0+cu129 |
| CUDA | 12.9 |
| Transformers | 5.12.1 |
| PEFT | 0.19.1 |
| Attention implementation | Flash Attention 2 |
| Training tracker | Weights & Biases |
| Runtime | 756 seconds |
| Runtime | ~12.6 minutes |
| Throughput | 27.6 samples/sec |
Note: The notebook was prepared for an A100 target, but the recorded run was executed on an NVIDIA H200 with 150.1 GB VRAM.
Training Results
Training completed successfully for 6 epochs and 330 optimization steps.
| Metric | Value |
|---|---|
| Final training loss, logged step 330 | 1.0633 |
| Overall train loss reported by trainer | 1.5250 |
| Final validation loss, step 330 | 1.7088 |
| Best validation loss | 1.6409 at step 140 |
| Final train mean token accuracy | 0.7597 |
| Final eval mean token accuracy | 0.6545 |
| Best eval mean token accuracy | 0.6584 at step 210 |
| Total training steps | 330 |
| Final epoch | 6 |
| Total tokens seen at final eval | 3,736,326 |
Evaluation Progress
| Step | Training Loss | Validation Loss | Eval Mean Token Accuracy | Tokens Seen |
|---|---|---|---|---|
| 70 | 1.6887 | 1.7474 | 0.6430 | 793,936 |
| 140 | 1.4219 | 1.6409 | 0.6540 | 1,588,892 |
| 210 | 1.2429 | 1.6442 | 0.6584 | 2,384,614 |
| 280 | 1.0833 | 1.6863 | 0.6562 | 3,171,273 |
| 330 | 1.0633 | 1.7088 | 0.6545 | 3,736,326 |
Notes on the Results
The training loss decreased consistently during the run, from 4.6616 at the first logged step to 1.0633 at step 330. Train mean token accuracy also improved steadily, reaching 0.7597 at the final logged step.
Validation performance improved early in training, with the best validation loss appearing at step 140 and the best evaluation mean token accuracy appearing at step 210. After that point, the training loss continued to decrease while validation loss increased slightly. This suggests that the final checkpoint is more strongly adapted to the training distribution, while an earlier checkpoint around steps 140–210 may generalize slightly better on the small held-out validation split.
Because the evaluation split contains only 71 examples, these metrics should be treated as training diagnostics rather than a full benchmark. A stronger evaluation should include human review by native Saudi Arabic speakers, dialect naturalness scoring, response helpfulness scoring, safety checks, and comparisons against the base model.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "AyoubChLin/lfm2.5-8b-saudi-dialect"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{
"role": "system",
"content": "أنت مساعد مفيد يتحدث باللهجة السعودية."
},
{
"role": "user",
"content": "هلا، وش تنصحني أسوي إذا أبي أتعلم برمجة؟"
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.05,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Recommended Generation Settings
For natural Saudi conversational responses:
generation_config = {
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True,
"repetition_penalty": 1.05,
}
For more deterministic assistant-style responses:
generation_config = {
"max_new_tokens": 256,
"temperature": 0.3,
"top_p": 0.8,
"do_sample": True,
"repetition_penalty": 1.05,
}
Limitations
This model is a specialized Saudi dialect fine-tune and may not be optimal for:
- Non-Saudi Arabic dialects
- Formal Modern Standard Arabic tasks
- Safety-critical domains
- Legal, medical, or financial advice
- Factual questions requiring up-to-date information
- Long-context reasoning beyond the fine-tuning distribution
The model may also reflect biases, inaccuracies, or style artifacts present in the training dataset.
Evaluation
The reported evaluation used validation loss and mean token accuracy on a small held-out split of 71 examples.
For future releases, stronger evaluation should include:
- Human evaluation by native Saudi Arabic speakers
- Dialect naturalness scoring
- Response helpfulness scoring
- Safety evaluation
- Comparison against the base model
- Saudi dialect benchmark prompts
- Evaluation on prompts outside the training dataset distribution
Citation
Base model:
@misc{liquidai_lfm25_8b_a1b,
title = {LFM2.5-8B-A1B},
author = {Liquid AI},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LiquidAI/LFM2.5-8B-A1B}}
}
Dataset:
@misc{saudi_dialect_conversations,
title = {Saudi Dialect Conversations},
author = {HeshamHaroon},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/HeshamHaroon/saudi-dialect-conversations}}
}
Disclaimer
This model is provided for research and development purposes. Outputs should be reviewed before use in production systems, especially in sensitive or high-stakes applications.
- Downloads last month
- 182