lm-human-preference-details vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 1 lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 1
vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 1
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 1
RLOO / PPOv2 TL;DR summarize checkpoints vwxyzjn/ppo_tldr Text Generation • Updated 23 days ago • 7 vwxyzjn/ppo_tldr_6.9b Text Generation • Updated 9 days ago • 3 vwxyzjn/rloo_tldr Text Generation • Updated 6 days ago • 4 vwxyzjn/rloo_tldr_6.9b Text Generation • Updated 9 days ago • 2
vwxyzjn/ppo_zephyr_vllm_2e-6_kl_0.03_num_mini_batches_4 Text Generation • Updated 4 days ago • 16
vwxyzjn/ppo_zephyr_vllm_1e-6_kl_0.02_num_mini_batches_4 Text Generation • Updated 4 days ago • 10
vwxyzjn/ppo_zephyr_vllm_1e-6_kl_0.03_num_mini_batches_1 Text Generation • Updated 4 days ago • 16
vwxyzjn/ppo_zephyr_vllm_1e-6_kl_0.03_num_mini_batches_4 Text Generation • Updated 4 days ago • 13
vwxyzjn/ppo_zephyr_vllm_1e-6_kl_0.03_num_mini_batches_2 Text Generation • Updated 4 days ago • 16
vwxyzjn/ppo_zephyr_vllm_2e-6_kl_0.03_num_mini_batches_1 Text Generation • Updated 4 days ago • 13
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1711138793 Viewer • Updated Mar 22
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1711138084 Viewer • Updated Mar 22