Aryabhata 2

Aryabhata 2 is a reasoning-focused language model developed by PhysicsWallah for competitive STEM examinations (JEE, NEET). It is obtained by post-training GPT-OSS-20B via reinforcement learning on a curated curriculum of Physics, Chemistry, Mathematics, and General Reasoning questions — achieving strong accuracy at substantially lower inference cost than comparable models.


Model Summary

Property Value
Base model openai/gpt-oss-20b
Training method Reinforcement Learning (GRPO) + LoRA
Training data Curated STEM questions (PhysicsWallah internal)
Training compute 2× NVIDIA H100 NVL GPUs

Performance

In-Distribution Benchmarks (Pass@1, 4-sample mean %)

Model JEE Adv. 2025 NEET 2025 JEE Main 2025 JEE Main 2026 Avg.
Gemini 2.5 Flash 96.81 90.00 87.26 96.22 90.23
GPT-5 Mini 93.65 87.33 87.07 95.83 89.71
Qwen3-30B-A3B (Thinking) 90.48 86.00 84.89 97.26 88.55
GPT-OSS-120B 84.13 85.33 85.61 95.42 88.28
Aryabhata 2 (ours) 86.51 84.66 87.80 92.99 88.95
Nemotron 3 Nano 30B A3B 90.87 84.00 82.89 94.84 86.51
GPT-OSS-20B 77.38 81.33 79.27 92.46 83.00

Out-of-Distribution Benchmarks (Pass@1, 4-sample mean %)

Model AIME HMMT GPQA MMLU-Pro MMLU-Redux 2.0 Avg.
GPT-OSS-120B 90.00 80.01 77.06 90.11 95.94 89.50
Qwen3-30B-A3B (Thinking) 84.58 51.88 73.31 90.80 97.77 89.42
Gemini 2.5 Flash 66.61 59.13 75.09 90.44 96.85 89.13
GPT-5 Mini 83.33 70.97 75.46 89.64 96.40 88.85
Aryabhata 2 (ours) 86.67 78.96 74.86 88.49 92.92 87.64
GPT-OSS-20B 86.67 77.42 70.51 85.42 93.32 84.95
Nemotron 3 Nano 30B A3B 77.08 65.86 65.38 84.33 94.10 83.48

Token Efficiency (Acc./1K tokens)

Aryabhata 2 achieves the best accuracy-per-token ratio of all evaluated models, using up to 64% fewer output tokens than GPT-OSS-20B.

Model In-Dist. Pass@1 In-Dist. Tokens In-Dist. Acc./1K↑ OOD Pass@1 OOD Tokens OOD Acc./1K↑
Aryabhata 2 (ours) 88.95 2,102 42.31 87.64 2,214 39.58
GPT-OSS-120B 88.28 3,312 26.66 89.50 3,661 24.44
Qwen3-30B-A3B (Thinking) 88.55 4,556 19.44 89.42 4,299 20.80
GPT-OSS-20B 83.00 5,293 15.68 84.95 4,860 17.48

Training Details

Data

The training corpus is derived from PhysicsWallah's internal question banks and processed through a multi-stage pipeline:

  • Cleaning pipeline: HTML/image removal → LaTeX validation → LLM-based completeness check → domain filtering (~24% of data removed).
  • Answer verification: Multi-pass sampling with GPT-OSS-120B as policy model and Qwen3-30B-A3B-Thinking as judge, covering 80% (1-sample), 8% (4-sample), and 4% (16-sample) of the dataset.

Methodology

Aryabhata 2 uses Group Relative Policy Optimization (GRPO) with LoRA adapters (rank 64, α=128), applied to attention projection and token embedding layers. Only 0.15% of parameters are trainable.

Reward function: R = R_accuracy × R_format, where accuracy uses a cascade of string, numeric, and symbolic matchers, and the format reward encourages well-structured, appropriately detailed responses.

Three-phase training:

Phase Steps Group Size Data Focus
1 – Format Alignment 300 8 ~5K (trivial) Output format
2 – Prolonged RL (ProRL) ~5,000 8 → 16 ~80K (learnable) Reasoning accuracy
3 – Broadened RL (BroRL) ~700 64 → 128 ~15K (challenging) Exploration & generalization

LoRA Configuration

Hyperparameter Value
Rank (r) 64
Scaling factor (α) 128
Dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, embed_tokens
Total parameters 20,959,661,632
Trainable parameters 31,850,496 (0.15%)

Usage

System Prompt

SYSTEM_PROMPT = """
The user will provide a problem. Solve the problem. Explain step by step and put the final answer inside \\boxed{}

# Instructions
- The solution you provide in the final channel should be complete. The user should be able to follow your output step by step in order to get to the final answer.
- In case of Multiple Choice Questions, provide the option identifier as the final answer. (Example: \\boxed{B})
- In case multiple options are correct, provide the correct option identifiers, separated by semicolon (;). (Example: \\boxed{A;C})
- Put any units in \\text{} within in \\boxed{}. (Example: \\boxed{9.8\\ \\text{m/s}^2})
- The final answer should be in a single \\boxed{}
""".strip()

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "PhysicsWallahAI/Aryabhata-2.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)


messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": YOUR_QUERY},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=4096,
    temperature=1.0,
)

response = tokenizer.decode(
    output[0][input_ids.shape[-1]:],
)
print(response)

vLLM

from vllm import LLM, SamplingParams

model_id = "PhysicsWallahAI/Aryabhata-2.0"

llm = LLM(
    model=model_id,
    dtype="bfloat16",
    tensor_parallel_size=1,   # increase for multi-GPU
    max_model_len=16384,
)

sampling_params = SamplingParams(
    temperature=1.0,
    max_tokens=4096,
    skip_special_tokens=False,
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": YOUR_QUERY},
]

outputs = llm.chat([messages], sampling_params)
print(outputs[0].outputs[0].text)

Intended Use

Primary use cases:

  • Competitive exam preparation (JEE Main, JEE Advanced, NEET)
  • STEM tutoring and student doubt resolution at scale
  • Multi-step symbolic and numerical reasoning

Citation

@misc{aryabhata2,
  author       = {Rastogi, Ritvik and Singh, Vishal and Chaudhari, Tejas and Varma, Sandeep},
  title        = {Aryabhata 2},
  year         = {2025},
  publisher    = {PhysicsWallah},
  howpublished = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhata-2.0}},
}

Contact

For questions, please contact ritvik.rastogi@pw.live (PhysicsWallah).

Downloads last month
259
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PhysicsWallahAI/Aryabhata-2.0

Adapter
(226)
this model
Adapters
3 models