Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

Inference Endpoints

AutoTrain Compatible

text-generation-inference

8-bit precision

Misc with no match

4-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

482

Full-text search

Active filters: ppo, trl

baek26/wiki_asp-software_3100_bart-base

Reinforcement Learning • Updated Apr 3, 2024 • 4

baek26/wiki_asp-written_work_4057_bart-base

Reinforcement Learning • Updated Apr 3, 2024 • 5

baek26/wiki_asp-software_7902_bart-base

Reinforcement Learning • Updated Apr 4, 2024 • 8

baek26/wiki_asp-written_work_667_bart-base

Reinforcement Learning • Updated Apr 4, 2024 • 4

baek26/wiki_asp-animal_3469_bart-base

Reinforcement Learning • Updated Apr 4, 2024 • 5

baek26/wiki_asp-soccer_player_9782_bart-base

Reinforcement Learning • Updated Apr 4, 2024 • 4

PranavBP525/phi-2-storygen-v1

Reinforcement Learning • Updated Apr 13, 2024 • 12

PranavBP525/phi-2-storygen-v2

Reinforcement Learning • Updated Apr 19, 2024 • 18

baek26/dialogsum_4088_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 4

baek26/billsum_4768_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 4

baek26/dialogsum_9789_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 5

baek26/billsum_6121_bart-billsum

Reinforcement Learning • Updated Apr 17, 2024 • 4

baek26/bart-dialogsum-oracle

Reinforcement Learning • Updated Apr 17, 2024 • 8

baek26/billsum_1703_bart-billsum

Reinforcement Learning • Updated Apr 17, 2024 • 8

baek26/bart-billsum-oracle

Reinforcement Learning • Updated Apr 17, 2024 • 4

baek26/cnn_dailymail_6849_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 8

baek26/cnn_dailymail_886_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 5

baek26/cnn_dailymail_7952_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 5

baek26/cnn_dailymail_4520_bart-cnndm

Reinforcement Learning • Updated Apr 19, 2024 • 4

baek26/cnn_dailymail_3418_bart-cnndm

Reinforcement Learning • Updated Apr 19, 2024 • 4

damienbenveniste/mistral-ppo

Reinforcement Learning • Updated Aug 23, 2024 • 49

pkbiswas/Phi-1_5-Detoxified-PPO-LoRa

Reinforcement Learning • Updated Apr 20, 2024 • 11

PranavBP525/phi-2-storygen-rlGPTf

Reinforcement Learning • Updated Apr 21, 2024 • 16

baek26/all_5483_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 4

baek26/all_9991_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 4

baek26/all_9006_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 4

baek26/all_6417_bart-base_rl

Reinforcement Learning • Updated Apr 22, 2024 • 6

IrwinD/log_sage_ppo_model

Summarization • Updated Apr 26, 2024 • 20 •

PranavBP525/phi-2-storygen-rlhf

Reinforcement Learning • Updated Apr 24, 2024 • 14

baek26/all_5286_all_6417_bart-base_rl

Reinforcement Learning • Updated Apr 29, 2024 • 50