JackFram/llama-160m · Fine Tuning

Hi thanks for your interests.

We trained it using the conventional transformer trainer on the downstream task for one epoch

I am not sure which task you want to fine-tune on, I guess the setup depends on different tasks, but I think if you want to naive fine-tuning (i.e., not collective-boost tuning) you can stick with trainer, which can help you adapt some of the params.

For this model we only do pretrain, so we basically use the raw format for C4 and wikitext

We don't have the plan for releasing a better version for now, but there is a chance that we will release another speculation architecture along with the tuned model weights.