Instructions to use Omarrran/koshur-kouter-ks-en_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Omarrran/koshur-kouter-ks-en_v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Omarrran/koshur-kouter-ks-en_v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Omarrran/koshur-kouter-ks-en_v1") model = AutoModelForCausalLM.from_pretrained("Omarrran/koshur-kouter-ks-en_v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Omarrran/koshur-kouter-ks-en_v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Omarrran/koshur-kouter-ks-en_v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/koshur-kouter-ks-en_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Omarrran/koshur-kouter-ks-en_v1
- SGLang
How to use Omarrran/koshur-kouter-ks-en_v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Omarrran/koshur-kouter-ks-en_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/koshur-kouter-ks-en_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Omarrran/koshur-kouter-ks-en_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Omarrran/koshur-kouter-ks-en_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Omarrran/koshur-kouter-ks-en_v1 with Docker Model Runner:
docker model run hf.co/Omarrran/koshur-kouter-ks-en_v1
Koshur Kouter KS-EN v1
A Kashmiri ↔ English translation model fine-tuned from sarvamai/sarvam-translate (Gemma 3, 4.5B parameters). A QLoRA adapter has been merged into the base weights and published as a fully self-contained transformers-compatible checkpoint.
Model Details
| Authors | Haq Nawaz Malik (@Omarrran) · Nahfid Nissar (@nafiboi) |
| Base model | sarvamai/sarvam-translate |
| Architecture | Gemma3ForCausalLM |
| Parameters | ~4.55 B (bf16) |
| Languages | Kashmiri (ks), English (en) |
| License | GPL-3.0 |
| Precision | bfloat16 |
| Context length | 131,072 tokens |
| Vocabulary size | 262,208 |
| Layers / hidden / heads / KV heads | 34 / 2560 / 8 / 4 |
| Checkpoint size | 8.48 GiB (5 shards) |
Intended Use
Primary use cases
- Kashmiri ↔ English machine translation, in either direction.
- Downstream evaluation, comparison, and benchmarking of Kashmiri NLP systems.
- Manual review workflows on mixed-direction prompts.
- Direct loading via
transformerswithout PEFT adapter merging.
Out-of-scope use
- Production-grade translation without human review.
- Long-form document translation (model is tuned on short, sentence-level pairs).
- Open-ended generation, dialogue, or any task other than translation.
- Safety-critical applications (medical, legal, financial advice).
Quickstart
Installation
pip install -U transformers accelerate sentencepiece safetensors torch
Load the model
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "Omarrran/koshur-kouter-ks-en_v1"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto",
).eval()
Translate
def first_nonempty_line(text: str) -> str:
for line in text.splitlines():
if line.strip():
return line.strip()
return text.strip()
def translate(source: str, direction: str, max_new_tokens: int = 48) -> str:
instructions = {
"ks2en": "Translate the text below to English. Return only the translation.",
"en2ks": "Translate the text below to Kashmiri. Return only the translation.",
}
messages = [
{"role": "system", "content": instructions[direction]},
{"role": "user", "content": source},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(
prompt, return_tensors="pt", truncation=True, max_length=1024
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
repetition_penalty=1.15,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.eos_token_id,
)
suffix = outputs[0][inputs.input_ids.shape[1]:]
return first_nonempty_line(tokenizer.decode(suffix, skip_special_tokens=True))
print(translate("Hello how are you doing today?.", "en2ks"))
print(translate("کٔشیر چھُ اکھ خوبصورت جٲی", "ks2en"))
A ready-to-run notebook is included at notebooks/colab_load_stage1_model.ipynb.
Recommended Decoding
The model is sensitive to decoding settings. The defaults shipped in generation_config.json are tuned for stable translation rather than open-ended generation:
| Parameter | Value |
|---|---|
do_sample |
False |
max_new_tokens |
48 |
repetition_penalty |
1.15 |
no_repeat_ngram_size |
3 |
Post-processing: decode only the generated suffix and take the first non-empty line. Looser decoding (sampling, longer windows, no repetition penalty) produces continuation artifacts and quote-tail noise.
Training
Procedure
The model was trained with a QLoRA supervised fine-tuning setup on a bidirectional Kashmiri ↔ English parallel corpus, on top of sarvamai/sarvam-translate. The resulting LoRA adapter was merged into the base weights; this release ships the merged checkpoint.
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Max sequence length | 512 |
| Effective batch size | 32 |
| Gradient accumulation | 1 |
| Precision | bf16 |
| Optimizer | paged_adamw_8bit |
| Learning rate | 2 × 10⁻⁴ |
| LR scheduler | cosine |
| Warmup ratio | 0.03 |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Epochs | 2 |
| LoRA rank / alpha / dropout | 32 / 64 / 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Modules saved (full) | lm_head, embed_tokens |
| Logging / eval / save steps | 10 / 250 / 250 |
Observed training metrics
| Metric | Value |
|---|---|
| Total steps | 20,060 |
| Wall-clock time | 5.19 h |
| Peak VRAM | 12.88 GB |
| Train loss — first logged | 8.0698 |
| Train loss — best logged | 0.5632 |
| Train loss — last step (step 20,060) | 0.5819 |
| Final train loss (averaged across run) | 0.6242 |
| Token accuracy — last / best | 0.8548 / 0.8557 |
| Eval loss — first / best | 1.4610 / 1.2011 |
| Final eval loss (step 20,000) | 1.2294 |
Training curves
Raw training logs are available under artifacts/:
stage1_training_log.jsonlstage1_summary.jsonstage1_eval.jsonstage1_metrics_summary.json
Evaluation
Internal eval (200 samples)
| Metric | Value |
|---|---|
| BLEU | 100.0 |
| chrF | 100.0 |
⚠️ Caveat. This score is implausibly perfect and almost certainly reflects an easy or in-distribution slice. It is reported here for completeness and must not be interpreted as a generalization estimate.
benchmark_simple_ks2en_200 (ks → en)
| Metric | Value |
|---|---|
| BLEU | 13.07 |
| chrF | 46.99 |
| Exact match | 0.00 |
The benchmark contains some noisy or mismatched references; scores should be interpreted as a lower-bound indicator rather than a clean evaluation. Files: benchmarks/benchmark_simple_ks2en_200.{jsonl,stage1_eval.json}.
500-sample mixed-direction review set (source-only)
A 500-sample source-only set is provided for human review. Automatic BLEU/chrF is not applicable.
| File | Purpose |
|---|---|
kashmiri_benchmark_500_source_only.jsonl |
Source prompts |
kashmiri_benchmark_500_stage1_safe_pred.jsonl |
Model predictions |
kashmiri_benchmark_500_stage1_safe_pred.{summary.json, xlsx, summary.xlsx} |
Aggregates and review-friendly spreadsheets |
Hardware Requirements
Inference
| Setting | VRAM | Notes |
|---|---|---|
| Short bf16 inference (observed) | ~8.5 GB | Validated on Modal |
| Recommended for short prompts | ≥ 12 GB | Comfortable headroom |
| Long prompts / batching | 16–24 GB+ | Recommended |
| CPU loading | ≥ 24 GB RAM | Full merged shards held in memory |
| Colab | L4 / A100 | Use 4/8-bit quantization on smaller GPUs |
Training (reference)
- Peak VRAM: 12.88 GB
- Wall-clock: 5.19 h for 20,060 steps
Repository Layout
.
├── README.md
├── config.json
├── generation_config.json
├── model.safetensors.index.json
├── model-0000{1..5}-of-00005.safetensors
├── tokenizer.json
├── tokenizer.model
├── artifacts/
│ ├── config.json
│ ├── stage1_eval.json
│ ├── stage1_metrics_summary.json
│ ├── stage1_summary.json
│ └── stage1_training_log.jsonl
├── assets/
│ ├── stage1_training_loss.svg
│ ├── stage1_token_accuracy.svg
│ ├── stage1_eval_loss.svg
│ └── stage1_gpu_alloc_gb.svg
├── benchmarks/
│ ├── benchmark_simple_ks2en_200.jsonl
│ ├── benchmark_simple_ks2en_200.stage1_eval.json
│ ├── kashmiri_benchmark_500_source_only.jsonl
│ ├── kashmiri_benchmark_500_stage1_safe_pred.jsonl
│ ├── kashmiri_benchmark_500_stage1_safe_pred.summary.json
│ ├── kashmiri_benchmark_500_stage1_safe_pred.xlsx
│ └── kashmiri_benchmark_500_stage1_safe_pred.summary.xlsx
└── notebooks/
└── colab_load_stage1_model.ipynb
Limitations and Risks
- Decoding sensitivity. Output quality degrades sharply under sampling or loose repetition controls. Use the recommended deterministic settings.
- Length bias. The model was trained on sentence-level pairs (≤ 512 tokens). Long-form translation is not supported.
- Internal eval is not generalization. The 100/100 BLEU/chrF figure is a fixture, not a quality claim.
- Benchmark noise. Reported BLEU/chrF on
benchmark_simple_ks2en_200is depressed by reference noise; treat as indicative only. - Domain coverage. The training distribution skews toward general / literary Kashmiri; performance on technical, legal, or dialectal inputs is unverified.
- Artifacts. Earlier decoding configurations occasionally produced quote-tail or continuation artifacts; the shipped
generation_config.jsonmitigates but does not eliminate these. - Low-resource caveat. Kashmiri remains a low-resource language; reference quality, orthographic normalization, and dialectal coverage are open problems that bound any model trained on currently-available data.
Citation
If you use this model, please cite:
@misc{malik2026koshurkouter,
title = {Koshur Kouter KS-EN v1: A Merged QLoRA Kashmiri--English Translation Model},
author = {Malik, Haq Nawaz and Nissar, Nahfid},
year = {2026},
howpublished = {\url{https://huggingface.co/Omarrran/koshur-kouter-ks-en_v1}},
note = {Fine-tuned from sarvamai/sarvam-translate}
}
Please also cite the base model:
@misc{sarvam2025translate,
title = {Sarvam-Translate},
author = {{Sarvam AI}},
howpublished = {\url{https://huggingface.co/sarvamai/sarvam-translate}}
}
Acknowledgements
Built on top of sarvamai/sarvam-translate. Training and evaluation infrastructure run on Colab and Modal. Thanks to the broader Kashmiri NLP community whose data and tooling made this work possible.
Contact
- Downloads last month
- 583
Model tree for Omarrran/koshur-kouter-ks-en_v1
Evaluation results
- BLEU on benchmark_simple_ks2en_200self-reported13.070
- chrF on benchmark_simple_ks2en_200self-reported46.990
- Exact Match on benchmark_simple_ks2en_200self-reported0.000
- BLEU on internal_evalself-reported100.000
- chrF on internal_evalself-reported100.000