anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-95d37aee1a
Text Classification
• 2B • Updated • 3
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-9fe16df365
Text Classification
• 2B • Updated • 3
anirudhb11/critic_1200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatu-6aa1e360d1
Text Classification
• 2B • Updated • 3
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-a7c942368f
Text Classification
• 2B • Updated • 3
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-093f62d880
Text Classification
• 2B • Updated • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-858e2b46a2
Text Classification
• 2B • Updated • 3
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-eaac83002c
Text Classification
• 2B • Updated • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-8de2febea2
Text Classification
• 2B • Updated • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-9929bdd81f
Text Classification
• 2B • Updated • 3
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-d5bd2dbecc
Text Classification
• 2B • Updated • 3
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-40bddeea62
Text Classification
• 2B • Updated • 3
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-c789b03075
2B • Updated • 3
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-47c741dc57
Text Classification
• 2B • Updated • 3
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-1877926eb9
2B • Updated • 3
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-ccb6db349d
2B • Updated • 3
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-0a51487a3c
Text Classification
• 2B • Updated • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c29b38b408
Text Classification
• 2B • Updated • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c5605d88a0
Text Classification
• 2B • Updated • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-64d2f83fd4
Text Classification
• 2B • Updated • 2
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-653962b457
Text Classification
• 2B • Updated • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-37f5c11603
Text Classification
• 2B • Updated • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-f5ff91747a
Text Classification
• 2B • Updated • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-4abf6a944f
Text Classification
• 2B • Updated • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-991ca4d859
Text Classification
• 2B • Updated • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-bf94641057
Text Classification
• 2B • Updated • 3
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-f2534c60d3
Text Classification
• 2B • Updated • 3
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-91a081ef96
Text Classification
• 2B • Updated • 3
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_actor_frozen_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_critic
Updated
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_subset_1k_critic
Updated