Instructions to use Nimra28/urdu-llama-pakistan with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nimra28/urdu-llama-pakistan with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Nimra28/urdu-llama-pakistan")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Nimra28/urdu-llama-pakistan", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nimra28/urdu-llama-pakistan with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nimra28/urdu-llama-pakistan"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nimra28/urdu-llama-pakistan",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Nimra28/urdu-llama-pakistan

SGLang

How to use Nimra28/urdu-llama-pakistan with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nimra28/urdu-llama-pakistan" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nimra28/urdu-llama-pakistan",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nimra28/urdu-llama-pakistan" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nimra28/urdu-llama-pakistan",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Nimra28/urdu-llama-pakistan with Docker Model Runner:
```
docker model run hf.co/Nimra28/urdu-llama-pakistan
```

🇵🇰 Urdu LLaMA — Pakistan's First Fine-tuned Urdu LLM

Llama 3.2-1B fine-tuned on a Pakistani corpus — Urdu news, literature, Islamic knowledge, history, culture and code-switching (Urdu+English)

Model Summary

Urdu LLaMA is Pakistan's first open-source instruction-following language model fine-tuned specifically on Pakistani and Urdu content. It is built on top of meta-llama/Llama-3.2-1B-Instruct and fine-tuned using QLoRA (4-bit quantization + LoRA) on a curated Pakistani corpus.

The model understands and generates text in:

🇵🇰 Urdu (primary language)
🌐 English (secondary)
💬 Code-switching (Urdu + English mixed — as spoken by Pakistanis daily)

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Nimra28/urdu-llama-pakistan"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

def ask_urdu(question):
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
آپ ایک مددگار پاکستانی AI اسسٹنٹ ہیں جو اردو اور انگریزی میں جواب دے سکتے ہیں۔<|eot_id|><|start_header_id|>user<|end_header_id|>
{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=200,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )
    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )
    return response.strip()

# Try it
print(ask_urdu("پاکستان کی تاریخ بتائیں"))
print(ask_urdu("علامہ اقبال کون تھے؟"))
print(ask_urdu("Machine learning کیا ہے؟"))

📚 Training Data

The model was fine-tuned on a curated Pakistani corpus covering:

Domain	Description
📰 Urdu News	Pakistani current affairs, politics, sports
📚 Urdu Literature	Poetry by Iqbal, Ghalib, Mir, Faiz Ahmed Faiz
🕌 Islamic Knowledge	Namaz, Zakat, Hajj, Ramadan, Quran knowledge
🏛️ Pakistani History	Independence, Quaid-e-Azam, national leaders
🎓 Education	CSS exam, MDCAT, academic topics
🍛 Culture & Food	Traditions, weddings, cuisine, festivals
💬 Code-switching	Natural Urdu+English mixed conversations
🌍 Geography	Provinces, cities, K2, national landmarks

⚙️ Training Details

Parameter	Value
Base Model	`unsloth/Llama-3.2-1B-Instruct`
Method	QLoRA (4-bit quantization + LoRA)
LoRA Rank	r=16, alpha=32
LoRA Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Dropout	0.05
Training Epochs	3
Learning Rate	2e-4
Batch Size	2 (effective 8 with gradient accumulation)
Max Sequence Length	512 tokens
Optimizer	paged_adamw_8bit
Hardware	Google Colab T4 GPU (free tier)
Training Time	~35 minutes
Quantization	4-bit NF4 with double quantization

📊 Evaluation

The model was evaluated on held-out Urdu test examples:

Perplexity: Improved significantly vs base model on Urdu text
Qualitative evaluation: Model responds correctly in Urdu to Pakistani cultural, historical, and Islamic questions
Code-switching: Handles natural Urdu+English mixing as spoken by Pakistanis

💬 Example Outputs

Question: پاکستان کب آزاد ہوا؟

پاکستان 14 اگست 1947 کو برطانوی استعمار سے آزاد ہوا۔ یہ دن ہر سال یوم آزادی کے طور پر منایا جاتا ہے۔ قائداعظم محمد علی جناح پاکستان کے پہلے گورنر جنرل بنے۔

Question: Machine learning کیا ہے؟

Machine learning یعنی مشینی سیکھ ایک ایسی technology ہے جس میں computer خود بخود data سے سیکھتا ہے۔ جیسے ہم experience سے سیکھتے ہیں، ویسے ہی machine بھی examples سے patterns پہچاننا سیکھتی ہے۔

Question: علامہ اقبال کون تھے؟

علامہ اقبال پاکستان کے قومی شاعر اور مفکر تھے۔ انہوں نے 1930 میں مسلمانوں کے لیے ایک الگ وطن کا خواب پیش کیا۔ انہیں شاعر مشرق اور مفکر پاکستان کہا جاتا ہے۔

⚠️ Limitations

Fine-tuned on a relatively small dataset (~30 curated examples) — larger dataset will improve quality
May occasionally mix Urdu and English even when pure Urdu is requested
Islamic knowledge is general — not a substitute for qualified scholars
Political opinions reflect training data — use with caution

🔮 Future Work

Scale training data to 10,000+ Urdu examples
Add Urdu news datasets from Dawn, Geo, ARY
Fine-tune on Urdu Wikipedia dump
Add Punjabi, Sindhi, Pashto support
Release larger 7B parameter version
Benchmark on Urdu NLP tasks

📄 Citation

If you use this model, please cite:

@misc{urdu-llama-pakistan-2024,
  title={Urdu LLaMA: Pakistan's First Fine-tuned Urdu Language Model},
  author={Nimra Tariq},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/Nimra28/urdu-llama-pakistan}
}

👩‍💻 Developed By

Nimra Tariq — AI Engineer & Assistant Professor, Superior University, Pakistan

🐙 GitHub: github.com/nimra-pixel
🤗 HuggingFace: huggingface.co/Nimra28

📜 License

This model is built on Llama 3.2 and follows the Llama 3.2 Community License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Nimra28/urdu-llama-pakistan

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

unsloth/Llama-3.2-1B-Instruct

Adapter

(403)

this model