MetaAnomie/rl_course_vizdoom_health_gathering_supreme
Reinforcement Learning
• Updated MetaAligner/MetaAligner-HH-RLHF-1.1B
Text Generation
• Updated • 221
MetaAligner/MetaAligner-HH-RLHF-7B
Text Generation
• Updated • 9
• 1
MetaAligner/MetaAligner-HH-RLHF-13B
Text Generation
• Updated • 7
• 2
angelaupc/Meta-Llama-3-8B-Instruct-rm-Anthropic-hh-rlhf-concateye
Updated
alexander-hm/Meta-Llama-3-8B_hh-rlhf_l0.0002_32-8-8-8-8
Updated
sandeepaffine/meta-llama-Llama-2-7b-chat-hf-base-cpt-domain-cpt-1L-ift-irdro-dpo-rlhf-v2
Updated
mradermacher/MetaAligner-HH-RLHF-7B-GGUF
7B • Updated • 28
mradermacher/MetaAligner-HH-RLHF-7B-i1-GGUF
7B • Updated • 50
robust-rlhf/Meta-Llama-3.1-8B-Instruct_ftjob-06146a4a1364
Text Generation
• 8B • Updated • 2
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-714276cc5ace
Updated
robust-rlhf/Meta-Llama-3.1-8B-Instruct_ftjob-4ec303c92c04
Updated
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-5962a63e3f8b
Text Generation
• 8B • Updated • 4
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-e8deee45b279
Updated
robust-rlhf/Meta-Llama-3.1-8B-Instruct_ftjob-8a0cd9d58380
Updated
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-test
Updated
mradermacher/MetaAligner-HH-RLHF-1.1B-GGUF
1B • Updated • 39
mradermacher/MetaAligner-HH-RLHF-1.1B-i1-GGUF
1B • Updated • 41
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-80cbcd764546
Updated
robust-rlhf/Meta-Llama-3.1-8B-Instruct-bnb-4bit_ftjob-0a86ae66c09e
Updated
hisham246/CS885-MetaRL-AssistiveRobotics
Updated
mradermacher/7b-Domain-RL-Meta-GGUF
8B • Updated • 19
• 2
Skewness-RL-KE/Qwen2-Math-1.5B-MetaMathQA_50k
Text Generation
• 2B • Updated • 3
Skewness-RL-KE/Qwen2-Math-1.5B-MetaMathQA
Text Generation
• 2B • Updated • 1
Reinforcement Learning
• Updated • 27
Reinforcement Learning
• Updated ASethi04/meta-llama-Llama-3.1-8B-big-math-rl-full-dataset-1000-lora-2-0.0001
Updated
Skewness-RL-KE/MetaMathQA_50k_distill_Qwen2-Math
Updated
metavind/Qwen3-0.6B-MiniSudoku-RL