Instructions to use junaid008/qehwa-pashto-llm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use junaid008/qehwa-pashto-llm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="junaid008/qehwa-pashto-llm")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm") model = AutoModelForCausalLM.from_pretrained("junaid008/qehwa-pashto-llm") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use junaid008/qehwa-pashto-llm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "junaid008/qehwa-pashto-llm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junaid008/qehwa-pashto-llm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/junaid008/qehwa-pashto-llm
- SGLang
How to use junaid008/qehwa-pashto-llm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "junaid008/qehwa-pashto-llm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junaid008/qehwa-pashto-llm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "junaid008/qehwa-pashto-llm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "junaid008/qehwa-pashto-llm", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use junaid008/qehwa-pashto-llm with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for junaid008/qehwa-pashto-llm to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for junaid008/qehwa-pashto-llm to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for junaid008/qehwa-pashto-llm to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="junaid008/qehwa-pashto-llm", max_seq_length=2048, ) - Docker Model Runner
How to use junaid008/qehwa-pashto-llm with Docker Model Runner:
docker model run hf.co/junaid008/qehwa-pashto-llm
☕ Qehwa — Pashto's First LLM
The first and best Pakistani Pashto large language model — specifically trained on Peshawari dialect.
Built by a solo developer as a free and open resource for 60+ million Pashto speakers worldwide.
⚠️ This model performs best on Pakistani/Peshawari Pashto. Performance may be lower on Afghan Pashto dialect.
🌟 Model Description
Qehwa is a fully instruction-tuned Pashto language model built on top of Qwen2.5-7B. It is the result of two-stage training:
- Continued Pre-Training (CPT) on 3.4 million clean Pakistani Pashto documents
- Supervised Fine-Tuning (SFT) on 126,519 high-quality Peshawari Pashto instruction-response pairs
This is the first dedicated Pakistani Pashto LLM — no comparable model exists publicly. It specifically targets the Peshawari/KPK dialect rather than generic or Afghan Pashto.
This repo contains the fully merged model — ready to use with standard transformers, no additional libraries required.
✨ Capabilities
- ✅ Answers questions in pure Peshawari Pashto
- ✅ Responds to English instructions in Pashto
- ✅ Responds to Urdu instructions in Pashto
- ✅ Natural Pashto conversation
- ✅ Pashto creative writing and poetry
- ✅ Islamic topics in Pashto
- ✅ KPK history, culture, and geography
- ✅ Pashtunwali traditions and ethics
- ✅ Pashto grammar correction
- ✅ English to Pashto translation
- ✅ Correct Pashto-specific characters: ښ ږ ټ ډ ړ ځ
📊 Evaluation Results
Qehwa was evaluated on a custom benchmark of 150 tests across 15 categories — the first ever comprehensive Pashto LLM benchmark. Since no standard Pashto benchmark exists publicly, this evaluation was designed specifically for Pakistani Pashto.
Top Performing Categories
| Category | Score |
|---|---|
| English → Pashto | 90% 🔥🔥 |
| Urdu → Pashto | 84% 🔥🔥 |
| Health & Daily Life in Pashto | 90% 🔥🔥 |
| Culture & History | 90% 🔥 |
| Geography & Nature | 90% 🔥 |
Overall Average Accuracy across all 15 benchmark categories: 85.3%
Evaluation Methodology
- 150 custom Pashto prompts across 15 categories
- Evaluated on A100 40GB GPU
- Human reviewed outputs for fluency, accuracy and dialect correctness
- No existing Pashto benchmark was available — this is the first Pashto LLM benchmark
💻 Installation
pip install transformers accelerate torch
For faster inference:
pip install unsloth
For running locally on CPU or small GPU:
pip install transformers accelerate bitsandbytes
🚀 How to Use
✅ Method 1 — Transformers (Recommended)
Best for: Research, production, standard usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "junaid008/qehwa-pashto-llm"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype = torch.bfloat16,
device_map = "auto",
)
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
### Instruction:
{}
### Response:
{}"""
def generate(prompt):
inputs = tokenizer(
ALPACA_TEMPLATE.format(prompt, ""),
return_tensors = "pt",
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens = 500,
temperature = 0.7,
do_sample = True,
repetition_penalty = 1.1,
pad_token_id = tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("### Response:")[-1].strip()
# Pashto input
print(generate("د پیښور تاریخ راته ووایه"))
# English input
print(generate("Tell me about Pashtunwali"))
# Urdu input
print(generate("پشاور کے بارے میں بتاؤ"))
✅ Method 2 — 4-bit Quantization (Low VRAM)
Best for: GPUs with 8GB VRAM or less
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
model_name = "junaid008/qehwa-pashto-llm"
bnb_config = BitsAndBytesConfig(
load_in_4bit = True,
bnb_4bit_quant_type = "nf4",
bnb_4bit_compute_dtype = torch.bfloat16,
bnb_4bit_use_double_quant = True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config = bnb_config,
device_map = "auto",
)
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
### Instruction:
{}
### Response:
{}"""
def generate(prompt):
inputs = tokenizer(
ALPACA_TEMPLATE.format(prompt, ""),
return_tensors = "pt",
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens = 500,
temperature = 0.7,
do_sample = True,
repetition_penalty = 1.1,
pad_token_id = tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("### Response:")[-1].strip()
print(generate("پښتونولي تشریح کړه"))
✅ Method 3 — Unsloth (2x Faster Inference)
Best for: Speed-optimized usage, Colab, A100/H100
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "junaid008/qehwa-pashto-llm",
max_seq_length = 2048,
dtype = None,
load_in_4bit = False,
)
FastLanguageModel.for_inference(model)
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
### Instruction:
{}
### Response:
{}"""
import torch
inputs = tokenizer(
ALPACA_TEMPLATE.format("د پیښور تاریخ راته ووایه", ""),
return_tensors = "pt",
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens = 500,
temperature = 0.7,
do_sample = True,
repetition_penalty = 1.1,
pad_token_id = tokenizer.pad_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())
✅ Method 4 — CPU Only (No GPU)
Best for: Testing on laptop, no GPU available (slow but works)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "junaid008/qehwa-pashto-llm"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype = torch.float32, # float32 for CPU
device_map = "cpu",
)
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
### Instruction:
{}
### Response:
{}"""
inputs = tokenizer(
ALPACA_TEMPLATE.format("پښتو ژبه د چا ده؟", ""),
return_tensors = "pt",
)
outputs = model.generate(
**inputs,
max_new_tokens = 200,
do_sample = False, # greedy for CPU speed
pad_token_id = tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[-1].strip())
✅ Method 5 — Google Colab (Free)
Best for: Trying without any local setup
Open in Colab and run:
# Install
!pip install transformers accelerate -q
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("junaid008/qehwa-pashto-llm")
model = AutoModelForCausalLM.from_pretrained(
"junaid008/qehwa-pashto-llm",
torch_dtype = torch.bfloat16,
device_map = "auto",
)
ALPACA_TEMPLATE = """Below is an instruction in Pashto or English. Write a detailed response in Pashto.
### Instruction:
{}
### Response:
{}"""
def generate(prompt):
inputs = tokenizer(ALPACA_TEMPLATE.format(prompt, ""), return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7,
do_sample=True, pad_token_id=tokenizer.eos_token_id)
return tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:")[-1].strip()
print(generate("Tell me about Peshawar"))
print(generate("پښتونولي تشریح کړه"))
print(generate("پشاور کا مشہور کھانا کیا ہے؟"))
⚙️ Hardware Requirements
| Method | VRAM | Speed |
|---|---|---|
| bfloat16 full | 16GB+ | ✅ Fast |
| 4-bit quantized | 8GB+ | ✅ Good |
| Unsloth | 16GB+ | 🔥 2x Faster |
| CPU only | No GPU | ⚠️ Slow |
📊 Training Details
Stage 1 — Continued Pre-Training (CPT)
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training steps | 5,000 |
| Final CPT loss | ~1.8 |
| Dataset size | 3,400,000 documents |
| Sequence length | 2,048 tokens |
| Precision | bfloat16 |
| LoRA rank | 64 |
| Learning rate | 5e-5 |
| Effective batch size | 32 |
Stage 2 — Supervised Fine-Tuning (SFT)
| Parameter | Value |
|---|---|
| Base model | junaid008/pashto-qwen2.5-7b-v3 (CPT) |
| Hardware | NVIDIA A100-SXM4-40GB |
| Training steps | 7,908 |
| Final SFT loss | 0.455 |
| Dataset size | 126,519 pairs |
| Epochs | 2 |
| Sequence length | 2,048 tokens |
| Precision | bfloat16 |
| LoRA rank | 64 |
| Learning rate | 5e-5 |
| Effective batch size | 32 |
📚 Dataset
CPT Dataset
- 3.4 million Pakistani Pashto documents
- Sources: news, books, religious texts, Wikipedia, web crawl
- Custom cleaned with Pashto-specific Unicode normalization
SFT Dataset — 126,519 pairs:
| Type | Description |
|---|---|
| Pashto → Pashto | Questions and answers in pure Peshawari Pashto |
| English → Pashto | English instructions with Pashto responses |
| Urdu → Pashto | Urdu instructions with Pashto responses |
| Conversation | Natural dialogue in Peshawari dialect |
| Islamic topics | Religious knowledge in Pashto |
| Creative writing | Pashto poetry, stories, descriptions |
| Grammar | Pashto language correction and explanation |
| Translation | English to Pashto translation pairs |
| Cultural | Pashtunwali, traditions, history, geography |
🗣️ Example Outputs
Pashto input:
Prompt : د پیښور تاریخ راته ووایه
Output : پیښور د نړۍ یو له تر ټولو پخوانیو ښارونو څخه دی چې تاریخ یې زرګونه
کاله پخوا ته رسیږي. د دې ښار پخوانی نوم پوشکلاوتي او بیا پروشاپور و.
د پیښور د بالاحصار کلا د مغلو او سکهانو د وخت یادګار دی...
English input:
Prompt : Tell me about Pashtunwali
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او سپېڅلی ضابطه اخلاق ده چې د دوی
ټولنیز ژوند یې تنظیم کړی دی. په دې کې مېلمه پالنه، غیرت، او د کمزورو
ساتنه تر ټولو لوی ارزښتونه ګڼل کېږي...
Urdu input:
Prompt : پشتونوالی کیا ہے؟
Output : پښتونولي د پښتنو د ژوند هغه لرغونی او زرین ضابطه اخلاق ده...
👤 Author
Junaid Aslam
- HuggingFace: junaid008
- Built independently as a contribution to Pashto NLP
📜 License
Apache 2.0 — free to use, modify, and distribute with attribution.
🤝 Citation
@misc{qehwa-pashto-llm,
author = {Junaid Aslam},
title = {Qehwa — Pashto's First LLM},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/junaid008/qehwa-pashto-llm}
}
- Downloads last month
- 1,442