Aryabhata 2

Aryabhata 2 is a reasoning-focused language model developed by PhysicsWallah for competitive STEM examinations (JEE, NEET). It is obtained by post-training GPT-OSS-20B via reinforcement learning on a curated curriculum of Physics, Chemistry, Mathematics, and General Reasoning questions — achieving strong accuracy at substantially lower inference cost than comparable models.

Model Summary

Property	Value
Base model	openai/gpt-oss-20b
Training method	Reinforcement Learning (GRPO) + LoRA
Training data	Curated STEM questions (PhysicsWallah internal)
Training compute	2× NVIDIA H100 NVL GPUs

Performance

In-Distribution Benchmarks (Pass@1, 4-sample mean %)

Model	JEE Adv. 2025	NEET 2025	JEE Main 2025	JEE Main 2026	Avg.
Gemini 2.5 Flash	96.81	90.00	87.26	96.22	90.23
GPT-5 Mini	93.65	87.33	87.07	95.83	89.71
Qwen3-30B-A3B (Thinking)	90.48	86.00	84.89	97.26	88.55
GPT-OSS-120B	84.13	85.33	85.61	95.42	88.28
Aryabhata 2 (ours)	86.51	84.66	87.80	92.99	88.95
Nemotron 3 Nano 30B A3B	90.87	84.00	82.89	94.84	86.51
GPT-OSS-20B	77.38	81.33	79.27	92.46	83.00

Out-of-Distribution Benchmarks (Pass@1, 4-sample mean %)

Model	AIME	HMMT	GPQA	MMLU-Pro	MMLU-Redux 2.0	Avg.
GPT-OSS-120B	90.00	80.01	77.06	90.11	95.94	89.50
Qwen3-30B-A3B (Thinking)	84.58	51.88	73.31	90.80	97.77	89.42
Gemini 2.5 Flash	66.61	59.13	75.09	90.44	96.85	89.13
GPT-5 Mini	83.33	70.97	75.46	89.64	96.40	88.85
Aryabhata 2 (ours)	86.67	78.96	74.86	88.49	92.92	87.64
GPT-OSS-20B	86.67	77.42	70.51	85.42	93.32	84.95
Nemotron 3 Nano 30B A3B	77.08	65.86	65.38	84.33	94.10	83.48

Token Efficiency (Acc./1K tokens)

Aryabhata 2 achieves the best accuracy-per-token ratio of all evaluated models, using up to 64% fewer output tokens than GPT-OSS-20B.

Model	In-Dist. Pass@1	In-Dist. Tokens	In-Dist. Acc./1K↑	OOD Pass@1	OOD Tokens	OOD Acc./1K↑
Aryabhata 2 (ours)	88.95	2,102	42.31	87.64	2,214	39.58
GPT-OSS-120B	88.28	3,312	26.66	89.50	3,661	24.44
Qwen3-30B-A3B (Thinking)	88.55	4,556	19.44	89.42	4,299	20.80
GPT-OSS-20B	83.00	5,293	15.68	84.95	4,860	17.48

Training Details

Data

The training corpus is derived from PhysicsWallah's internal question banks and processed through a multi-stage pipeline:

Cleaning pipeline: HTML/image removal → LaTeX validation → LLM-based completeness check → domain filtering (~24% of data removed).
Answer verification: Multi-pass sampling with GPT-OSS-120B as policy model and Qwen3-30B-A3B-Thinking as judge, covering 80% (1-sample), 8% (4-sample), and 4% (16-sample) of the dataset.

Methodology

Aryabhata 2 uses Group Relative Policy Optimization (GRPO) with LoRA adapters (rank 64, α=128), applied to attention projection and token embedding layers. Only 0.15% of parameters are trainable.

Reward function: R = R_accuracy × R_format, where accuracy uses a cascade of string, numeric, and symbolic matchers, and the format reward encourages well-structured, appropriately detailed responses.

Three-phase training:

Phase	Steps	Group Size	Data	Focus
1 – Format Alignment	300	8	~5K (trivial)	Output format
2 – Prolonged RL (ProRL)	~5,000	8 → 16	~80K (learnable)	Reasoning accuracy
3 – Broadened RL (BroRL)	~700	64 → 128	~15K (challenging)	Exploration & generalization

LoRA Configuration

Hyperparameter	Value
Rank (r)	64
Scaling factor (α)	128
Dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, embed_tokens
Total parameters	20,959,661,632
Trainable parameters	31,850,496 (0.15%)

Usage

System Prompt

SYSTEM_PROMPT = """
The user will provide a problem. Solve the problem. Explain step by step and put the final answer inside \\boxed{}

# Instructions
- The solution you provide in the final channel should be complete. The user should be able to follow your output step by step in order to get to the final answer.
- In case of Multiple Choice Questions, provide the option identifier as the final answer. (Example: \\boxed{B})
- In case multiple options are correct, provide the correct option identifiers, separated by semicolon (;). (Example: \\boxed{A;C})
- Put any units in \\text{} within in \\boxed{}. (Example: \\boxed{9.8\\ \\text{m/s}^2})
- The final answer should be in a single \\boxed{}
""".strip()

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "PhysicsWallahAI/Aryabhata-2.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)


messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": YOUR_QUERY},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=4096,
    temperature=1.0,
)

response = tokenizer.decode(
    output[0][input_ids.shape[-1]:],
)
print(response)

vLLM

from vllm import LLM, SamplingParams

model_id = "PhysicsWallahAI/Aryabhata-2.0"

llm = LLM(
    model=model_id,
    dtype="bfloat16",
    tensor_parallel_size=1,   # increase for multi-GPU
    max_model_len=16384,
)

sampling_params = SamplingParams(
    temperature=1.0,
    max_tokens=4096,
    skip_special_tokens=False,
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": YOUR_QUERY},
]

outputs = llm.chat([messages], sampling_params)
print(outputs[0].outputs[0].text)

Intended Use

Primary use cases:

Competitive exam preparation (JEE Main, JEE Advanced, NEET)
STEM tutoring and student doubt resolution at scale
Multi-step symbolic and numerical reasoning

Citation

@misc{aryabhata2,
  author       = {Rastogi, Ritvik and Singh, Vishal and Chaudhari, Tejas and Varma, Sandeep},
  title        = {Aryabhata 2},
  year         = {2025},
  publisher    = {PhysicsWallah},
  howpublished = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhata-2.0}},
}

Contact

For questions, please contact ritvik.rastogi@pw.live (PhysicsWallah).

Downloads last month: 259

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for PhysicsWallahAI/Aryabhata-2.0

Base model

openai/gpt-oss-20b

Adapter

(226)

this model

Adapters

3 models