Could you please explain more details about fine-tuning LLaMA-2-7B to LLaMA-2-7B-32k? Such as the fine-tuning steps and batch size. Thanks!

#32
by Mooler - opened

Hi! I've read the original PI paper. It says they only fine-tune about 1000 steps to extend the context window. Did you tune the same steps (i.e. 1000 steps) as the original paper? Thanks!

Sign up or log in to comment