Instructions to use neo-saket/vidya-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use neo-saket/vidya-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="neo-saket/vidya-4b",
	filename="vidya-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use neo-saket/vidya-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf neo-saket/vidya-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf neo-saket/vidya-4b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf neo-saket/vidya-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf neo-saket/vidya-4b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf neo-saket/vidya-4b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf neo-saket/vidya-4b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf neo-saket/vidya-4b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf neo-saket/vidya-4b:Q4_K_M

Use Docker

docker model run hf.co/neo-saket/vidya-4b:Q4_K_M

LM Studio
Jan

vLLM

How to use neo-saket/vidya-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "neo-saket/vidya-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neo-saket/vidya-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/neo-saket/vidya-4b:Q4_K_M

Ollama
How to use neo-saket/vidya-4b with Ollama:
```
ollama run hf.co/neo-saket/vidya-4b:Q4_K_M
```

Unsloth Studio

How to use neo-saket/vidya-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for neo-saket/vidya-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for neo-saket/vidya-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for neo-saket/vidya-4b to start chatting

How to use neo-saket/vidya-4b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf neo-saket/vidya-4b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "neo-saket/vidya-4b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use neo-saket/vidya-4b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf neo-saket/vidya-4b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default neo-saket/vidya-4b:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use neo-saket/vidya-4b with Docker Model Runner:
```
docker model run hf.co/neo-saket/vidya-4b:Q4_K_M
```

Lemonade

How to use neo-saket/vidya-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull neo-saket/vidya-4b:Q4_K_M

Run and chat with the model

lemonade run user.vidya-4b-Q4_K_M

List all available models

lemonade list

Vidya 4B — NCERT Tutoring Assistant

Vidya is a fine-tuned Qwen3.5-4B model trained to be an expert tutoring assistant for Indian students preparing for NCERT Classes 6–12, IIT-JEE, and NEET. It explains concepts step-by-step with examples from the Indian curriculum, corrects misconceptions, and adapts to the student's level.

Tutoring score: 4.1 / 5.0 (Gemini 2.0 Flash as judge, 30-sample eval across accuracy, clarity, step-by-step reasoning, appropriate level, encouragement, and misconception handling)

A larger Vidya 9B variant is also available (tutoring score 4.6/5.0).

Quick Start

Ollama (recommended)

ollama run neosaket/vidya:4b

Or with a Modelfile for custom parameters:

FROM neosaket/vidya:4b
PARAMETER temperature 0.3
PARAMETER repeat_penalty 1.5

llama.cpp

llama-cli -m vidya-4b-Q4_K_M.gguf \
  --system "You are Vidya, an expert tutor for Indian students preparing for NCERT Classes 6-12, IIT-JEE, and NEET. Explain concepts step by step with examples from the Indian curriculum." \
  -i

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("neo-saket/vidya-4b", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("neo-saket/vidya-4b")

messages = [
    {"role": "system", "content": "You are Vidya, an expert tutor for Indian students preparing for NCERT Classes 6-12, IIT-JEE, and NEET. Explain concepts step by step with examples from the Indian curriculum."},
    {"role": "user", "content": "Explain Newton's second law of motion with a Class 9 example."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Suppressing Chain-of-Thought

Qwen3.5 models produce a <think>...</think> reasoning block before the answer. To skip it and get a direct response, pre-fill the assistant turn:

# Add this to your chat template call or inject directly:
# <|im_start|>assistant\n<think>\n\n</think>\n\n

Training Details

Pipeline: SFT → DPO (no CPT due to data constraints)

Stage	Details
Base model	Qwen/Qwen3.5-4B
SFT	NCERT synthetic Q&A (GPT-4o-mini generated), HuggingFace NCERT datasets, SATHEE exam papers
DPO	Preference pairs on NCERT content; beta=0.1
LoRA	r=16, alpha=16, DoRA + rsLoRA
Hardware	RTX 5090 (32 GB VRAM)
SFT LR	3e-5
DPO LR	5e-6
Max seq length	4096

GRPO was also explored for JEE/NEET MCQ reasoning but caused catastrophic tutoring regression with short completions — the deployed model is the DPO checkpoint.

Evaluation

30-sample tutoring evaluation, judged by Gemini 2.0 Flash:

Dimension	Score
Overall	4.1 / 5.0
Accuracy	3.8
Clarity	4.4
Step-by-step reasoning	4.5
Appropriate level	4.5
Encouragement	4.5
Misconception handling	4.8

The accuracy score (3.8) reflects the 4B model's smaller factual recall capacity; the 9B variant achieves 4.6/5.0 overall with stronger accuracy.

Sample Interactions

Physics (Class 9):

User: Explain Newton's second law of motion with an example.

Vidya: Great question! Newton's Second Law is one of the most important concepts in Class 9 Physics.

**The Law:** The force acting on an object equals its mass times its acceleration.
  F = ma

**Example:** Imagine pushing a cricket ball and a shot put with the same force.
- Cricket ball (mass ≈ 0.16 kg) → accelerates quickly
- Shot put (mass ≈ 7 kg) → barely moves

Same force (F), different masses → different accelerations. The heavier object has less acceleration.

Key takeaway: More mass = more force needed to produce the same acceleration. ✓

Chemistry (Class 10):

User: What is a displacement reaction? Give an example.

Vidya: A displacement reaction occurs when a more reactive element pushes a less reactive element out of its compound.

**Classic NCERT example — Zinc displacing Copper:**
  Zn + CuSO₄ → ZnSO₄ + Cu

Step by step:
1. Zinc (Zn) is higher in the reactivity series than Copper (Cu)
2. Zinc atoms lose electrons → become Zn²⁺ ions in solution
3. Cu²⁺ ions gain those electrons → deposit as solid copper
4. The blue CuSO₄ solution turns colourless; copper metal appears

⚠️ Common misconception: Less reactive metals cannot displace more reactive ones.
  Iron cannot displace zinc from ZnSO₄ — the reaction won't proceed.

Files

File	Description
`*.safetensors`	Merged BF16 model weights (use for fine-tuning or GGUF conversion)
`vidya-Q4_K_M.gguf`	Quantized GGUF, 4-bit — recommended for Ollama / llama.cpp
`Modelfile`	Ollama Modelfile with recommended settings
`config.json`, `tokenizer*`	Model configuration and tokenizer

Limitations

Accuracy is weaker than the 9B variant — may occasionally hallucinate specific NCERT facts
Primary training language is English; Hindi support is limited
Optimised for Indian curriculum (NCERT, JEE, NEET) — general-purpose use may be suboptimal
Not suitable for medical advice, legal guidance, or safety-critical applications

Citation

@misc{vidya4b2026,
  title={Vidya 4B: A Fine-tuned NCERT Tutoring Assistant},
  author={Saket Nayak},
  year={2026},
  url={https://huggingface.co/neo-saket/vidya-4b}
}

License

Apache 2.0 — inherits from the Qwen3.5 base model license.

Downloads last month: 166

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for neo-saket/vidya-4b

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(240)

this model

neo-saket
/

vidya-4b

Vidya 4B — NCERT Tutoring Assistant

Quick Start

Ollama (recommended)

llama.cpp

Transformers

Suppressing Chain-of-Thought

Training Details

Evaluation

Sample Interactions

Files

Limitations

Citation

License

Model tree for neo-saket/vidya-4b

Dataset used to train neo-saket/vidya-4b