Instructions to use FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT
- SGLang
How to use FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT with Docker Model Runner:
docker model run hf.co/FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT
Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT
Introduction
Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT is an early-stage checkpoint from the upcoming Nandi-Mini-V1.1-600M model family, trained after 250 billion tokens. This is not the final converged model.
Nandi-Mini is a compact multilingual language model designed for strong efficiency, deployment flexibility, and improved Indic language support.
The model incorporates several architectural optimizations, including:
- Optimized Shared-KV, reducing KV cache usage by nearly 50%
- Factorized embeddings, enabling strong parameter efficiency and helping achieve fertility scores competitive with models over 500Γ larger
- Q-Norm for improved training stability and performance
The model is being trained completely from scratch and is designed to deliver strong performance under low compute and memory constraints.
This checkpoint is being shared to provide an early look into the modelβs scaling behavior and training progress.
This release is an intermediate checkpoint, not the final model. Performance is expected to improve significantly with continued training and scaling.
We are releasing intermediate checkpoints to maintain transparency with the community and demonstrate that our results are achieved without benchmark-specific optimizations or shortcut techniques.
π’ We will soon share technical blog ! Stay tuned!
Architectural Highlights
Nandi-Mini-600M introduces several efficiency-focused architectural optimizations designed for compact yet capable language models.
Shared KV (Shared Key-Value Vectors)
Shared KV is one of the core architectural ideas explored in Nandi-Mini. Instead of computing separate Key and Value projections, both reuse a shared latent representation.
This design reduces KV-cache memory usage by ~50% during inference with only a small increase in compute overhead, since RoPE applied dynamically during attention computation.
KV-Cache Memory Comparison
- Vanilla KV β Standard KV-cache memory usage
- Shared KV β ~50% lower KV-cache footprint
Shared KV is part of our broader focus on deployable foundation models optimized for:
- On-premise AI systems
- Memory-constrained deployments
- Edge devices
- Long-context inference workloads
This remains an active research area within the Nandi model family, and we plan to share deeper technical details in upcoming engineering blogs.
Model Details
- Type: Causal Language Model
- Training Stage: Early Pretraining Checkpoint (250 Billions tokens)
- Parameters: ~600M
- Architecture: Transformer decoder
- Positional Encoding: RoPE
- Normalization: RMSNorm
- Activation: SwiGLU
- Attention: GQA + Shared KV
- Embeddings: Tied embeddings & Factorized Embeddings
- Context length: 2,048 tokens (planned to be extended to 32,000 tokens)
- Vocabulary Size: 131,072
Tokenization Fertility Score Across Languages
| Language | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-1 | Nandi-Mini-V1.1 |
|---|---|---|---|---|
| English | 1.17 | 1.16 | 1.32 | 1.18 |
| Bengali | 8.66 | 7.51 | 1.55 | 1.44 |
| Gujarati | 10.47 | 9.37 | 1.55 | 1.53 |
| Hindi | 2.71 | 5.14 | 1.25 | 1.32 |
| Kannada | 16.43 | 12.96 | 2.10 | 1.90 |
| Malayalam | 17.77 | 14.56 | 2.49 | 2.05 |
| Marathi | 3.73 | 6.70 | 1.55 | 1.55 |
| Oriya | 19.07 | 15.75 | 2.18 | 2.68 |
| Punjabi | 9.23 | 8.66 | 1.47 | 1.42 |
| Tamil | 13.56 | 10.93 | 2.06 | 2.05 |
| Telugu | 15.40 | 13.38 | 2.09 | 1.77 |
| Assamese | 9.26 | 8.13 | 4.31 | 1.51 |
π Supported Languages
The model is trained on English and a diverse set of Indic languages, including:
Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
π Usage
!pip install transformers=='5.4.0'
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "FrontiersMind/Nandi-Mini-V1.1-600M-Early-Checkpoint-250GT"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
dtype=torch.bfloat16
).to(device).eval()
prompt = """The night was quiet and the streets were empty"""
model_inputs = tokenizer(
[prompt],
return_tensors="pt"
).to(model.device)
outputs = model.generate(
**model_inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.3,
top_k=20,
top_p=0.95,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
use_cache=True,
)
response = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
print(response)
- Downloads last month
- 236