gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-selection-swepo-on-policy-iteration2 Viewer • Updated 12 days ago • 63.1k • 23
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_24 Viewer • Updated 12 days ago • 60.8k • 21
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_16 Viewer • Updated 12 days ago • 60.8k • 22
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-1vs3-optimal-selection Viewer • Updated 13 days ago • 62.2k • 25
gupta-tanish/Ultrafeedback-mistral-7b-instruct-1vs3-kmeans-selection Viewer • Updated 13 days ago • 62.2k • 25
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-optimal-selection Viewer • Updated 14 days ago • 62.2k • 30
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-kmeans-selection Viewer • Updated 14 days ago • 62.2k • 41
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-1vs3-simpo-selection Viewer • Updated 14 days ago • 62.7k • 39