bradmin/reward-bert-multi-answer
Updated
cduoduo/TCMConverse-4B-SFT-PPO-MultiReward-Alignment
Updated
Tahahah/pacman_policy_net_gamengen_1_rainbow_negative_pellet_reward_multienv
Updated
Chi666/multiple_scores_reward_model_v7
Text Classification
• 0.1B • Updated • 1
rayonlabs/Qwen2_5-7B-Instruct-multilingual-reward-bench-cb8829bf-0e4c-4904-995a-3e14b40486a4
Updated
WPRM/qwenvl_reward_multimodal_llamafactory
4B • Updated WPRM/qwen2_5vl-3b_ar_reward_cot_multimodal
4B • Updated WPRM/qwen2_5vl-3b_ar_reward_cot_wo_checklist_multimodal
4B • Updated WPRM/qwen2_5vl-3b_ar_reward_cot_multimodal_final_new
4B • Updated • 2
WPRM/qwen2_5vl-3b_ar_reward_cot_multimodal_mtl
4B • Updated • 2
nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingual
Text Generation
• 71B • Updated • 47
• • 11
ysc0034/grpo_pure_code_spatial457_wo_multireward_90
rayonlabs/Qwen1_5-0_5B-Chat-multilingual-reward-bench-4fd2a9c8-ee0a-493d-ae0f-a110381f0506
0.5B • Updated • 8
hoooooooooori/multi_reward
Updated
Yuhan123/multipref-reward-model-qwen-single
Updated
Yuhan123/multipref-reward-model-qwen
Text Classification
• 2B • Updated Yuhan123/olmo-multipref-reward-model
1B • Updated • 1
worstcoder/SD3.5M-DiffusionNFT-MultiReward
Text-to-Image
• Updated • 245
• 7
AmirMohseni/skywork-reward-v2-llama-3.1-8b-rank512-eduarena-multiturn-lmarena-all-data
Updated
AmirMohseni/skywork-reward-v2-llama-3.1-8b-rank128-eduarena-multiturn-lmarena-all-data
Updated
AmirMohseni/skywork-reward-v2-llama-3.1-8b-rank128-eduarena-multiturn-lmarena-multiturn
Updated
AmirMohseni/skywork-reward-v2-llama-3.1-8b-rank128-eduarena-multiturn-lmarena-multiturn-v2
Updated
AmirMohseni/skywork-reward-v2-llama-3.1-8b-rank128-test-full-multiturn
Updated
Brtwm/reward_model_multilingual
Text Classification
• 0.1B • Updated • 1
phuongntc/vit5_large_ppo_rewardverify_multievalsumviet2_lorapenalty300
Updated
phuongntc/vit5_large_ppo_rewardverify_multievalsumviet2_lorapenalty7000
Updated
phuongntc/vit5_large_grpo_rewardverify_multievalsumviet2_lora
Updated
phuongntc/vit5_large_grpo_rewardverify_multievalsumviet2
Updated
phuongntc/vit5-large-grpo_rewardverify_multievalsumviet2_nopenalty
Updated
GazeEzio/mol_optim_property_grpo_multi_turn_sum_intermediate_h20_reward_v0
Updated