anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_bce_r2_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_5k_r2_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_5k_bce_r2_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_r2_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_bce_r3_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_5k_bce_r3_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_5k_r3_critic
Updated • 16
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_r3_critic
Updated • 19
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_bce_context_4k_r3_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_1k_bce_context_4k_r3_critic
Updated
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-71947f583d
Text Classification
• 2B • Updated • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-0b5154ab3e
Text Classification
• 2B • Updated • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-9446afcb19
Text Classification
• 2B • Updated • 2
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-d17cb5def8
Text Classification
• 2B • Updated • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-36a3a03718
Text Classification
• 2B • Updated • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-f553c1779b
Text Classification
• 2B • Updated • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-2d56bd1e02
Text Classification
• 2B • Updated • 1
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-c08ed8b533
Text Classification
• 2B • Updated • 2
anirudhb11/critic_600_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-07fa1b4078
Text Classification
• 2B • Updated • 3
anirudhb11/critic_400_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-c49633e26e
Text Classification
• 2B • Updated • 3
anirudhb11/critic_200_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-0e5f8c09dc
Text Classification
• 2B • Updated • 3
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c889153a3b
Text Classification
• 2B • Updated • 3
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c0842d8e93
Text Classification
• 2B • Updated • 3
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-4bac9e133e
Text Classification
• 2B • Updated • 3
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-ae0cd033d2
Text Classification
• 2B • Updated • 3
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_hendrycks_math_DeepSeek-R1-Distill-Qwen-1.5B_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_subset_2000_r3_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_repeated_fin_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_fin_critic
Updated