Edit Models filters

Tasks

Text Generation

Image-Text-to-Text

Parameters

Libraries

Transformers.js

sentence-transformers

Apps

Inference Providers

Models

97

Base only

LLM-OS-Models/LFM2.5-8B-A1B-Raw-ECHO-RLVR-GRPO-Adapters

Updated 29 minutes ago • 1

LLM-OS-Models/LFM2.5-8B-A1B-SFT1-Online-ECHO-RLVR-GRPO-Adapters

Updated about 17 hours ago • 1

jojo0217/llm_rlhf

Text Generation • Updated Aug 19, 2023

taku-yoshioka/rlhf_llm_custom_rm

Reinforcement Learning • Updated Mar 3, 2024 • 1

llm-jp/llm-jp-13b-dpo-lora-hh_rlhf_ja-v1.1

Text Generation • Updated Mar 12, 2024 • 1

umarigan/Trendyol-LLM-7b-chat-v1.0-RLHF

Question Answering • 7B • Updated Mar 16, 2024

taku-yoshioka/rlhf-llm-custom-rm-0828

Reinforcement Learning • Updated Aug 31, 2024 • 6

paulheiniger/rlmodel_llm

Updated Nov 15, 2024

IWAIYuma/llm-jp-3-13b-it_RLHFv3

Updated Dec 15, 2024 • 1

rl-llm-agent/Llama-3.1-8B-Instruct-sft-alfworld-iter0

Text Generation • 8B • Updated Jan 3, 2025 • 5

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-iter0

Text Generation • 3B • Updated Jan 4, 2025 • 10 •

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter0

Updated Jan 8, 2025 • 1

rl-llm-coders/mbpp_1e-6_DBS2

Text Generation • 8B • Updated Jan 9, 2025 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1

Text Generation • 3B • Updated Jan 10, 2025 • 4

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2

Updated Jan 11, 2025 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0

Updated Jan 13, 2025 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft

Updated Jan 13, 2025 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0

Updated Jan 13, 2025 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0

Updated Jan 14, 2025 • 1

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1

Updated Jan 20, 2025 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k

Updated Jan 16, 2025 • 5

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50

Updated Jan 16, 2025 • 1

rl-llm-coders/iSFT_1b_v1_mbpp_5e-7_DBS1_ep2_iter1

Text Generation • 1B • Updated Jan 26, 2025 • 2

rl-llm-coders/iSFT_8b_v1_mbpp_5e-7_DBS1_ep4_iter1

Text Generation • 8B • Updated Jan 26, 2025 • 2

rl-llm-coders/RM_1B_MBPP

Text Generation • 1B • Updated Jan 27, 2025 • 4 •

rl-llm-coders/ST_SFT_1B

Text Generation • 1B • Updated Jan 28, 2025 • 4 •

rl-llm-coders/RS_1B_SFT_iter1

Text Generation • 1B • Updated Jan 29, 2025 • 4 •

rl-llm-coders/RS_1B_SFT_iter2

Text Generation • 1B • Updated Jan 29, 2025 • 4

rl-llm-coders/RS_1B_SFT_iter3

Text Generation • 1B • Updated Jan 29, 2025 • 3 •

rl-llm-coders/RS_1B_RM_iter2

Text Generation • 1B • Updated Jan 29, 2025 • 5