Aryabhata 2
Aryabhata 2 is a reasoning-focused language model developed by PhysicsWallah for competitive STEM examinations (JEE, NEET). It is obtained by post-training GPT-OSS-20B via reinforcement learning on a curated curriculum of Physics, Chemistry, Mathematics, and General Reasoning questions — achieving strong accuracy at substantially lower inference cost than comparable models.
Model Summary
| Property | Value |
|---|---|
| Base model | openai/gpt-oss-20b |
| Training method | Reinforcement Learning (GRPO) + LoRA |
| Training data | Curated STEM questions (PhysicsWallah internal) |
| Training compute | 2× NVIDIA H100 NVL GPUs |
Performance
In-Distribution Benchmarks (Pass@1, 4-sample mean %)
| Model | JEE Adv. 2025 | NEET 2025 | JEE Main 2025 | JEE Main 2026 | Avg. |
|---|---|---|---|---|---|
| Gemini 2.5 Flash | 96.81 | 90.00 | 87.26 | 96.22 | 90.23 |
| GPT-5 Mini | 93.65 | 87.33 | 87.07 | 95.83 | 89.71 |
| Qwen3-30B-A3B (Thinking) | 90.48 | 86.00 | 84.89 | 97.26 | 88.55 |
| GPT-OSS-120B | 84.13 | 85.33 | 85.61 | 95.42 | 88.28 |
| Aryabhata 2 (ours) | 86.51 | 84.66 | 87.80 | 92.99 | 88.95 |
| Nemotron 3 Nano 30B A3B | 90.87 | 84.00 | 82.89 | 94.84 | 86.51 |
| GPT-OSS-20B | 77.38 | 81.33 | 79.27 | 92.46 | 83.00 |
Out-of-Distribution Benchmarks (Pass@1, 4-sample mean %)
| Model | AIME | HMMT | GPQA | MMLU-Pro | MMLU-Redux 2.0 | Avg. |
|---|---|---|---|---|---|---|
| GPT-OSS-120B | 90.00 | 80.01 | 77.06 | 90.11 | 95.94 | 89.50 |
| Qwen3-30B-A3B (Thinking) | 84.58 | 51.88 | 73.31 | 90.80 | 97.77 | 89.42 |
| Gemini 2.5 Flash | 66.61 | 59.13 | 75.09 | 90.44 | 96.85 | 89.13 |
| GPT-5 Mini | 83.33 | 70.97 | 75.46 | 89.64 | 96.40 | 88.85 |
| Aryabhata 2 (ours) | 86.67 | 78.96 | 74.86 | 88.49 | 92.92 | 87.64 |
| GPT-OSS-20B | 86.67 | 77.42 | 70.51 | 85.42 | 93.32 | 84.95 |
| Nemotron 3 Nano 30B A3B | 77.08 | 65.86 | 65.38 | 84.33 | 94.10 | 83.48 |
Token Efficiency (Acc./1K tokens)
Aryabhata 2 achieves the best accuracy-per-token ratio of all evaluated models, using up to 64% fewer output tokens than GPT-OSS-20B.
| Model | In-Dist. Pass@1 | In-Dist. Tokens | In-Dist. Acc./1K↑ | OOD Pass@1 | OOD Tokens | OOD Acc./1K↑ |
|---|---|---|---|---|---|---|
| Aryabhata 2 (ours) | 88.95 | 2,102 | 42.31 | 87.64 | 2,214 | 39.58 |
| GPT-OSS-120B | 88.28 | 3,312 | 26.66 | 89.50 | 3,661 | 24.44 |
| Qwen3-30B-A3B (Thinking) | 88.55 | 4,556 | 19.44 | 89.42 | 4,299 | 20.80 |
| GPT-OSS-20B | 83.00 | 5,293 | 15.68 | 84.95 | 4,860 | 17.48 |
Training Details
Data
The training corpus is derived from PhysicsWallah's internal question banks and processed through a multi-stage pipeline:
- Cleaning pipeline: HTML/image removal → LaTeX validation → LLM-based completeness check → domain filtering (~24% of data removed).
- Answer verification: Multi-pass sampling with GPT-OSS-120B as policy model and Qwen3-30B-A3B-Thinking as judge, covering 80% (1-sample), 8% (4-sample), and 4% (16-sample) of the dataset.
Methodology
Aryabhata 2 uses Group Relative Policy Optimization (GRPO) with LoRA adapters (rank 64, α=128), applied to attention projection and token embedding layers. Only 0.15% of parameters are trainable.
Reward function: R = R_accuracy × R_format, where accuracy uses a cascade of string, numeric, and symbolic matchers, and the format reward encourages well-structured, appropriately detailed responses.
Three-phase training:
| Phase | Steps | Group Size | Data | Focus |
|---|---|---|---|---|
| 1 – Format Alignment | 300 | 8 | ~5K (trivial) | Output format |
| 2 – Prolonged RL (ProRL) | ~5,000 | 8 → 16 | ~80K (learnable) | Reasoning accuracy |
| 3 – Broadened RL (BroRL) | ~700 | 64 → 128 | ~15K (challenging) | Exploration & generalization |
LoRA Configuration
| Hyperparameter | Value |
|---|---|
| Rank (r) | 64 |
| Scaling factor (α) | 128 |
| Dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, embed_tokens |
| Total parameters | 20,959,661,632 |
| Trainable parameters | 31,850,496 (0.15%) |
Usage
System Prompt
SYSTEM_PROMPT = """
The user will provide a problem. Solve the problem. Explain step by step and put the final answer inside \\boxed{}
# Instructions
- The solution you provide in the final channel should be complete. The user should be able to follow your output step by step in order to get to the final answer.
- In case of Multiple Choice Questions, provide the option identifier as the final answer. (Example: \\boxed{B})
- In case multiple options are correct, provide the correct option identifiers, separated by semicolon (;). (Example: \\boxed{A;C})
- Put any units in \\text{} within in \\boxed{}. (Example: \\boxed{9.8\\ \\text{m/s}^2})
- The final answer should be in a single \\boxed{}
""".strip()
Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "PhysicsWallahAI/Aryabhata-2.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": YOUR_QUERY},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
output = model.generate(
input_ids,
max_new_tokens=4096,
temperature=1.0,
)
response = tokenizer.decode(
output[0][input_ids.shape[-1]:],
)
print(response)
vLLM
from vllm import LLM, SamplingParams
model_id = "PhysicsWallahAI/Aryabhata-2.0"
llm = LLM(
model=model_id,
dtype="bfloat16",
tensor_parallel_size=1, # increase for multi-GPU
max_model_len=16384,
)
sampling_params = SamplingParams(
temperature=1.0,
max_tokens=4096,
skip_special_tokens=False,
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": YOUR_QUERY},
]
outputs = llm.chat([messages], sampling_params)
print(outputs[0].outputs[0].text)
Intended Use
Primary use cases:
- Competitive exam preparation (JEE Main, JEE Advanced, NEET)
- STEM tutoring and student doubt resolution at scale
- Multi-step symbolic and numerical reasoning
Citation
@misc{aryabhata2,
author = {Rastogi, Ritvik and Singh, Vishal and Chaudhari, Tejas and Varma, Sandeep},
title = {Aryabhata 2},
year = {2025},
publisher = {PhysicsWallah},
howpublished = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhata-2.0}},
}
Contact
For questions, please contact ritvik.rastogi@pw.live (PhysicsWallah).
- Downloads last month
- 259