Hello, can you elaborate on these conditional behavior cloning and weighted behavior cloning?
#1
by
teknium
- opened
What are they? I looked the terms up on Google and found nothing.
If it's RLHF, what differentiates the two methods? Thanks
Thanks for your interest. In short, we simply use different prompts like "Assistant GPT3.5" and "Assistant GPT4". We are preparing a paper to elaborate on our technical report.
Thanks for your interest. In short, we simply use different prompts like "Assistant GPT3.5" and "Assistant GPT4". We are preparing a paper to elaborate on our technical report.
Will it be a significant performance drop if not using conditional behavior cloning, i.e., all 80K samples with a uniform "Assistant:" prompt?
Yes. This may have the same performance as Vicuna.