Finetune without using run_clm.py

#16
by TianlaiChen - opened
This comment has been hidden

Hi Leo,

I haven’t tried to fine-tune without the helper script but is definitely possible. I guess once you have defined tokenizer and model (which you do right) I would follow the tutorial. In this case you will have to create a txt file with your sequences, replacing the fasta headers with the endoftext tag. Then not sure how the tutorial does it, but you’ll have to tokenize, and define the splits (90/10?) for that dataset.
Hope this helps
Noelia

TianlaiChen changed discussion status to closed

Sign up or log in to comment