Is this released checkpoint already finetuned by following the 3-steps outlined in the InstructGPT paper?

by Eamymao - opened

The readme told us that this model is finetuned on webgpt and prompt_dialogue (version v2), but it doesn't explain the detail of finetuning. Therefore it is a bit confusing whether this model has been finetuned by RLHF steps in InstructGPT and what is the finetuning process. Does anyone know something about this?

Sign up or log in to comment