-
-
-
-
-
-
Inference status
Active filters:
trl
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-3-use-violation
Reinforcement Learning
•
Updated
•
5
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-4-use-violation
Reinforcement Learning
•
Updated
•
5
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-4
Reinforcement Learning
•
Updated
•
5
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-4-use-violation
Reinforcement Learning
•
Updated
•
5
SummerSigh/T5-Base-Rule-Of-Thumb-RM2
Reinforcement Learning
•
Updated
•
14
dshin/flan-t5-ppo-user-h-batch-size-64
Reinforcement Learning
•
Updated
•
6
dshin/flan-t5-ppo-user-f-batch-size-64
Reinforcement Learning
•
Updated
•
5
dshin/flan-t5-ppo-user-f-batch-size-64-use-violation
Reinforcement Learning
•
Updated
•
12
dshin/flan-t5-ppo-user-h-batch-size-64-use-violation
Reinforcement Learning
•
Updated
•
6
dshin/flan-t5-ppo-user-e-batch-size-64-use-violation
Reinforcement Learning
•
Updated
•
8
dshin/flan-t5-ppo-user-e-batch-size-64
Reinforcement Learning
•
Updated
•
6
trl-lib/llama-7b-se-peft
Bearnardd/gpt2-imdb
Reinforcement Learning
•
Updated
•
9
trl-lib/llama-7b-se-rl-peft
Updated
•
103
Bearnardd/test_bearnard
Reinforcement Learning
•
Updated
•
7
Bearnardd/test_beard
Reinforcement Learning
•
Updated
•
11
trl-lib/llama-7b-se-rm-peft
vincentmin/opt-125m-eli5-rl-finetune-128-8-8-1.4e-5_ada
Reinforcement Learning
•
Updated
dshin/flan-t5-ppo-user-a-allenai-prosocial-dialog-testing-upload
Reinforcement Learning
•
Updated
•
6
dshin/flan-t5-ppo-user-a-allenai-prosocial-dialog
Reinforcement Learning
•
Updated
•
4
dshin/flan-t5-ppo-user-f-allenai-prosocial-dialog
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-h-allenai-prosocial-dialog
Reinforcement Learning
•
Updated
•
3
dshin/flan-t5-ppo-user-e-allenai-prosocial-dialog
Reinforcement Learning
•
Updated
•
4
wengnews/tuning_llama_rl_checkpointsstep_9
Reinforcement Learning
•
Updated
eurus7/working
Reinforcement Learning
•
Updated
eurus7/ppo_trainer
Reinforcement Learning
•
Updated
eurus7/gpt2-imdb-pos-v2
Reinforcement Learning
•
Updated
zou00080/llama_PPO_pos_formal
Reinforcement Learning
•
Updated
•
6
zou00080/llama_PPO_pos_informal
Reinforcement Learning
•
Updated
•
6
zou00080/llama_PPO_neg_formal
Reinforcement Learning
•
Updated
•
6