You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

KoshurAI v3 — Kashmiri ↔ English Translation (LoRA Adapter)

⚠️ Gated model. Request access to download weights.

A LoRA fine-tuned adapter for bidirectional Kashmiri ↔ English translation, built on top of Faizaniqbal/KoshurAI_Tarjuma_v2 — itself a Gemma 3 (4.5B) model continually pretrained on 2.8M tokens of Kashmiri text.

On the FLORES-200 devtest (1,012 sentences), KoshurAI v3 outperforms NLLB-200 distilled-600M on COMET in both translation directions.


Model Details

Author Faizan Iqbal (@Faizaniqbal)
Base model Faizaniqbal/KoshurAI_Tarjuma_v2
Adapter type LoRA (QLoRA training)
Architecture Gemma3ForCausalLM + PEFT LoRA
Languages Kashmiri (ks · kas_Arab), English (en)
License Apache-2.0
Training data 16,637 curated bidirectional EN↔KS sentence pairs
Training compute Google Colab GPU

Model Tree

google/gemma-3-4b-pt
    └─ google/gemma-3-4b-it
           └─ sarvamai/sarvam-translate
                  └─ Faizaniqbal/KoshurAI_Tarjuma_v2   ← 2.8M Kashmiri pretraining
                         └─ Faizaniqbal/KoshurAI_Tarjuma_v3           ← this adapter (SFT)

Quickstart

Install

pip install transformers peft accelerate bitsandbytes sentencepiece

Load & Translate

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "Faizaniqbal/KoshurAI_Tarjuma_v2"
ADAPTER    = "Faizaniqbal/KoshurAI_Tarjuma_v3"

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tok = AutoTokenizer.from_pretrained(BASE_MODEL)
tok.pad_token = tok.eos_token

base  = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, quantization_config=bnb_cfg, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

def translate(text, direction="en2ks"):
    prefix = "Translate to Kashmiri: " if direction == "en2ks" else "Translate to English: "
    prompt = f"<start_of_turn>user\n{prefix}{text}<end_of_turn>\n<start_of_turn>model\n"
    inputs = tok(prompt, return_tensors="pt", truncation=True, max_length=512).to("cuda")
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=150,
            min_new_tokens=5,
            do_sample=False,
            repetition_penalty=1.1,
        )
    return tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()

print(translate("The dog is sleeping.", "en2ks"))
print(translate("ہونٛد چھُ شُنٛگِتھ", "ks2en"))

Training

Stage 1 — Kashmiri Pretraining (base model)

The base model (KoshurAI_Tarjuma_v2) was continually pretrained on 2.8 million tokens of Kashmiri text from publicly available sources (literature, journalism, academic texts, religious scholarship). This gave the model deep Kashmiri language knowledge.

Stage 2 — SFT for Translation (this adapter)

This LoRA adapter was trained on 16,637 curated bidirectional sentence pairs (EN↔KS + KS↔EN) to teach the model explicit translation capability.

Split Records
Base SFT corpus (v2) 15,527
New pairs (v3) 1,110
Total 16,637

Training Configuration

Hyperparameter Value
Base model Faizaniqbal/KoshurAI_Tarjuma_v2
LoRA rank (r) 16
LoRA alpha 16
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4 (BitsAndBytes)
Compute dtype bfloat16
Epochs 2
Learning rate 1e-4
Effective batch size 8 (2 × grad_accum 4)
Max sequence length 512 tokens
Optimizer paged_adamw_8bit
LR scheduler Cosine
Warmup steps 100
Weight decay 0.01

Evaluation — FLORES-200 Devtest (1,012 sentences)

Direction Model BLEU COMET
KS→EN KoshurAI v3 (ours) 15.74 0.6982
KS→EN NLLB-200 distilled-600M 16.28 0.6741
EN→KS KoshurAI v3 (ours) 30.37¹ 0.6604
EN→KS NLLB-200 distilled-600M 39.65¹ 0.6431

¹ EN→KS BLEU is character-level (tokenize='char'), standard for Arabic-script output. COMET = Unbabel/wmt22-comet-da system score.

KoshurAI v3 outperforms NLLB-200 on COMET in both directions.

Sample Translations (EN→KS)

English KoshurAI v3
They include the Netherlands, with Anna Jochemsen finishing ninth. تِیَم چھُ نیدرلینڈس شامِل کَران اَینا جوکیمسن فِنِشِنگ نائنتھ سیتھ
Hershey and Chase used phages, or viruses, to implant their own DNA. ۂرشے تہٕ چیسن کٔرۍ فیگ تہٕ جَراثیم منٛز پنُن ڈی این اے اَزناوُنہِ خٲطر
They usually have special food, drink and entertainment offers. تِیَمَن چھُ اکثر خاص کھٮ۪ن، چیٖز تہٕ تفریح پیش کَرنہِ یِوان

Inference Settings

Parameter Value
do_sample False (greedy)
max_new_tokens 150 (EN→KS) / 200 (KS→EN)
min_new_tokens 5
repetition_penalty 1.1

Hardware Requirements

Setting VRAM
4-bit inference (recommended) ~6–8 GB
Colab free tier (T4) ✅ with 4-bit
Colab L4 / A100 ✅ comfortable

Limitations

  • Trained on sentence-level pairs (≤ 512 tokens); long-form translation unsupported.
  • Performance on technical, legal, or dialectal Kashmiri is unverified.
  • No human evaluation conducted; COMET and BLEU are automatic metrics only.
  • 4-bit quantization used for inference; full-precision may yield higher scores.

Citation

If you use this model, please cite:

@misc{iqbal2026koshurai,
  title        = {KoshurAI v3: A Fine-Tuned Neural Machine Translation System
                  for Kashmiri--English Bidirectional Translation},
  author       = {Iqbal, Faizan},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/Faizaniqbal/KoshurAI_Tarjuma_v3}},
  note         = {LoRA adapter fine-tuned from Faizaniqbal/KoshurAI_Tarjuma_v2}
}

This work fine-tunes the model by Malik & Nissar — also cite:

@misc{malik2026koshurkouter,
  title        = {Koshur Kouter KS-EN v1: A Merged QLoRA Kashmiri--English Translation Model},
  author       = {Malik, Haq Nawaz and Nissar, Nahfid},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/Omarrran/koshur-kouter-ks-en_v1}},
  note         = {Fine-tuned from sarvamai/sarvam-translate}
}

And the original base model:

@misc{sarvam2025translate,
  title        = {Sarvam-Translate},
  author       = {{Sarvam AI}},
  howpublished = {\url{https://huggingface.co/sarvamai/sarvam-translate}}
}

Acknowledgements

This model builds on Omarrran/koshur-kouter-ks-en_v1, which was fine-tuned by Haq Nawaz Malik & Nahfid Nissar (2026), itself built on sarvamai/sarvam-translate (Gemma 3, 4.5B) by Sarvam AI. Evaluated on FLORES-200 devtest. COMET scored using Unbabel/wmt22-comet-da.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Faizaniqbal/KoshurAI_Tarjuma_v3

Evaluation results