🇮🇳 Gemma-4-E4B-Hindi-Instruct (16-bit)

A Hindi instruction-tuned fine-tune of Gemma 4 E4B. This is the merged 16-bit model for use with 🤗 Transformers / vLLM / further fine-tuning.

For local CPU/edge use, see the GGUF build.

Part of my Hindi LLM Series — small, openly-documented Indic models that actually follow instructions in Hindi and run on your own machine.


Usage (Transformers)

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model_id = "pankajpandey-dev/gemma-4-e4b-hindi-instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
proc  = AutoProcessor.from_pretrained(model_id)

msgs = [{"role": "user", "content": [{"type": "text", "text": "मशीन लर्निंग को आसान शब्दों में समझाओ।"}]}]
inputs = proc.apply_chat_template(msgs, add_generation_prompt=True, tokenize=True,
                                  return_dict=True, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, use_cache=True)
print(proc.decode(out[0], skip_special_tokens=True))

Example outputs

Prompt: भारत के बारे में एक रोचक तथ्य बताओ।

भारत दुनिया में सबसे अधिक भाषाओं वाले देशों में से एक है — 22 आधिकारिक भाषाएँ और 1,000 से अधिक बोलियाँ। हिंदी एक इंडो-आर्यन भाषा है, जबकि तमिल एक द्रविड़ भाषा है।


Training details

Base model unsloth/gemma-4-E4B-it
Method LoRA (r=16, α=16), response-only loss
Framework Unsloth
Data ~10k Hindi instruction pairs (AI4Bharat indic-instruct: anudesh + dolly, hi splits)
Epochs 2
LR / schedule 1e-4, cosine
Precision bf16 (4-bit QLoRA base)
Hardware Single NVIDIA L4 (24 GB)
Final train loss ~0.29

Trained text-only (vision layers frozen), single-BOS chat template to avoid double-BOS corruption.


Related repos


Provenance & license (please read)

Mixed-license lineage — review all before redistribution or commercial use:

Raw training data is not redistributed here. You are responsible for complying with the Gemma, Llama 2, and CC-BY-SA terms.


Limitations

  • ~8B-class model: strong Hindi fluency, but can hallucinate facts and occasionally repeat phrasing on long open-ended generation.
  • Tuned for single-turn Hindi instructions; long multi-turn chat is not the focus.
  • Not safety-aligned for production.

Acknowledgements

Base model by Google (Gemma 4). Data by AI4Bharat. Fine-tuning with Unsloth. 🙏

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pankajpandey-dev/gemma-4-e4b-hindi-instruct

Finetuned
(90)
this model

Dataset used to train pankajpandey-dev/gemma-4-e4b-hindi-instruct

Collection including pankajpandey-dev/gemma-4-e4b-hindi-instruct

Article mentioning pankajpandey-dev/gemma-4-e4b-hindi-instruct