Instructions to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct")
model = AutoModelForCausalLM.from_pretrained("pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

SGLang

How to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct",
    max_seq_length=2048,
)

Docker Model Runner
How to use pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct with Docker Model Runner:
```
docker model run hf.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MiniCPM5-1B-Hindi-Instruct

A Hindi instruction-tuned variant of openbmb/MiniCPM5-1B, fine-tuned for Hindi (हिंदी) conversational and instruction-following tasks.

Part of the 🇮🇳 Hindi LLM Series by @pankajpandey-dev.

Model Details

Base model: openbmb/MiniCPM5-1B (1.1B parameters)
Language: Hindi (हिंदी), with English understanding retained from the base
Fine-tuning method: LoRA (r=32, alpha=64) merged into base weights
Training framework: Unsloth + TRL
License: Apache 2.0

Training Data

Fine-tuned on 4,000 high-quality Hindi instruction examples sampled from:

ai4bharat/indic-instruct-data-v0.1 — anudesh (Hindi split): native crowd-sourced Hindi instructions
ai4bharat/indic-instruct-data-v0.1 — dolly (Hindi split, filtered to chrF ≥ 60): broad instruction variety

All examples ≤ 2048 tokens, formatted with the MiniCPM5 ChatML template.

Training Configuration

Hyperparameter	Value
LoRA rank	32
LoRA alpha	64
LoRA dropout	0.0
Target modules	q, k, v, o, gate, up, down
Batch size (effective)	16
Learning rate	2e-4
LR scheduler	cosine
Warmup steps	15
Epochs	2
Total steps	500
Precision	fp16 (4-bit base)
Hardware	NVIDIA Tesla T4 (Colab)
Training time	~60 minutes
Final training loss	1.108

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "नमस्ते! बारिश के दिन पर एक छोटी कविता लिखो।"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Recommended Generation Parameters

temperature: 0.7 (lower = more focused, higher = more creative)
top_p: 0.9
repetition_penalty: 1.1
max_new_tokens: 256–512 depending on task

LoRA Adapter Only

If you prefer to load the LoRA adapter on top of the base model (~85 MB vs 2.2 GB), it's available in the lora_adapter/ folder of this repo:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct", subfolder="lora_adapter")

Example Outputs

Prompt: बारिश के दिन पर एक छोटी कविता लिखो। Response: (creative Hindi poetry generation)

Prompt: मशीन लर्निंग क्या है? सरल हिंदी में समझाइए। Response: (simplified Hindi explanation of ML)

Prompt: नमस्ते! अपना परिचय दीजिए। Response: (conversational Hindi self-introduction)

Quantized Versions (GGUF)

For running locally with llama.cpp, Ollama, LM Studio, or other GGUF-compatible inference engines.

Acknowledgements

OpenBMB for the MiniCPM5-1B base model
AI4Bharat (IIT Madras) for the indic-instruct-data dataset
Unsloth for the training framework

Citation

If you use this model in your work, please cite:

@misc{pandey2026minicpm5hindi,
  title  = {MiniCPM5-1B-Hindi-Instruct},
  author = {Pankaj Pandey},
  year   = {2026},
  url    = {https://huggingface.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct}
}

Part of an ongoing effort to bring strong open-source LLMs to Indian languages. Feedback and contributions welcome via the community tab.