LLM-OS-Models/LFM2.5-8B-A1B-Raw-ECHO-RLVR-GRPO-Adapters
Updated • 1
LLM-OS-Models/LFM2.5-8B-A1B-SFT1-Online-ECHO-RLVR-GRPO-Adapters
Updated • 1
Text Generation
• Updated taku-yoshioka/rlhf_llm_custom_rm
Reinforcement Learning
• Updated • 1
llm-jp/llm-jp-13b-dpo-lora-hh_rlhf_ja-v1.1
Text Generation
• Updated • 1
umarigan/Trendyol-LLM-7b-chat-v1.0-RLHF
Question Answering
• 7B • Updated taku-yoshioka/rlhf-llm-custom-rm-0828
Reinforcement Learning
• Updated • 6
IWAIYuma/llm-jp-3-13b-it_RLHFv3
rl-llm-agent/Llama-3.1-8B-Instruct-sft-alfworld-iter0
Text Generation
• 8B • Updated • 5
rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-iter0
Text Generation
• 3B • Updated • 10
• rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter0
rl-llm-coders/mbpp_1e-6_DBS2
Text Generation
• 8B • Updated • 3
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1
Text Generation
• 3B • Updated • 4
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0
rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50
rl-llm-coders/iSFT_1b_v1_mbpp_5e-7_DBS1_ep2_iter1
Text Generation
• 1B • Updated • 2
rl-llm-coders/iSFT_8b_v1_mbpp_5e-7_DBS1_ep4_iter1
Text Generation
• 8B • Updated • 2
Text Generation
• 1B • Updated • 4
• Text Generation
• 1B • Updated • 4
• rl-llm-coders/RS_1B_SFT_iter1
Text Generation
• 1B • Updated • 4
• rl-llm-coders/RS_1B_SFT_iter2
Text Generation
• 1B • Updated • 4
rl-llm-coders/RS_1B_SFT_iter3
Text Generation
• 1B • Updated • 3
• rl-llm-coders/RS_1B_RM_iter2
Text Generation
• 1B • Updated • 5