Zhaolin Gao
GitBag
AI & ML interests
Reinforcement Learning from Human Feedback
Organizations
Collections
1
models
254
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e3_lr_3e-7_1731931011
Text Generation
•
Updated
•
7
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e2_lr_3e-7_1731926025
Text Generation
•
Updated
•
7
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e1_lr_3e-7_1731903957
Text Generation
•
Updated
•
10
GitBag/reasoning_rebel_iter_5_1731714556_eta_1e4_lr_3e-7_1731935968
Text Generation
•
Updated
•
8
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e4_lr_3e-7_1731719519
Text Generation
•
Updated
•
11
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e3_lr_3e-7_1731714556
Text Generation
•
Updated
•
43
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e2_lr_3e-7_1731709582
Text Generation
•
Updated
•
9
GitBag/reasoning_rebel_iter_4_1731513485_eta_1e1_lr_3e-7_1731686912
Text Generation
•
Updated
•
9
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e5_lr_3e-7_1731523653
Text Generation
•
Updated
•
11
GitBag/reasoning_rebel_iter_3_1731243878_eta_1e6_lr_3e-7_1731528705
Text Generation
•
Updated
•
5
datasets
252
GitBag/llama3-ultrafeedback-reasoning-ReRe-armo-tokenized
Viewer
•
Updated
•
229k
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo-tokenized_harvard
Viewer
•
Updated
•
54.6k
•
10
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo-tokenized
Viewer
•
Updated
•
54.6k
•
3
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556-armo
Viewer
•
Updated
•
60.8k
•
7
GitBag/llama3-ultrafeedback-reasoning-iter_5-1731714556
Viewer
•
Updated
•
60.8k
•
8
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized_harvard
Viewer
•
Updated
•
56.3k
•
16
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo-tokenized
Viewer
•
Updated
•
56.3k
•
12
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485-armo
Viewer
•
Updated
•
60.8k
•
12
GitBag/llama3-ultrafeedback-reasoning-iter_4-1731513485
Viewer
•
Updated
•
60.8k
•
12
GitBag/llama3-ultrafeedback-reasoning-iter_3-1731243878-armo-tokenized_harvard
Viewer
•
Updated
•
57.2k
•
18