SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"
Haozhe Ji
ehzoah
AI & ML interests
language modeling, text generation
Organizations
None yet
Collections
1
Papers
1
models
7
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
Updated
•
1
ehzoah/Llama-3.2-1B-sft-full
Text Generation
•
Updated
•
124
ehzoah/pythia-1.4b-sft-full
Updated
ehzoah/exo-hh-reward-model
Updated
ehzoah/exo-imdb-sft-model
Text Generation
•
Updated
•
13
ehzoah/exo-imdb-reward-model
Text Generation
•
Updated
•
15
ehzoah/exo-hh-sft-model
Updated