Context Length?

by brucethemoose - opened Jan 1, 2024

Discussion

brucethemoose

Jan 1, 2024

I'm very excited to try this model, especially once the DPO version comes out!

Just out of curiosity, what context length was it trained it?

jondurbin

Owner Jan 1, 2024

I used 4096, mainly because nearly all of the instructions fell within that (and actually the vast majority was under that). I may do one more pass on the full scripts of cinematika to add some coherence out to the tens of thousands, but it's costly.

DPO version is ready BTW.

brucethemoose

Jan 1, 2024

•

edited Jan 1, 2024

Yeah I just saw that! Hard to find stuff on HF. For what it's worth, other 200K finetunes seemed to preserve some long context performance even trained at 4K, but I would be extremely interested in a bagel finetuned out to just 40K-75K.

I'm not sure what you use to train, but you might find this context length VRAM usage/perplexity graph from a paper interesting: https://github.com/huggingface/peft/issues/958

As well as unsloth, which does reduce VRAM usage significantly: https://github.com/unslothai/unsloth

Technically unsloth and axoltl don't integrate longlora into lora training, but its probably fine?

jondurbin

Owner Jan 1, 2024

I used a mix of qlora and some full weight tuning for this. Thanks for the link and info, very interesting!

I'd probably do full weight if/when I try extending trained ctx length, but I was hoping it would just inherit longer ctx capabilities from base.

brucethemoose

Jan 2, 2024

•

edited Jan 2, 2024

I hope you do! But of course I appreciate the DPO finetune as is!

but I was hoping it would just inherit longer ctx capabilities from base.

Other models do, still quantizing bagel to test it myself. But I bet long context data would really help. This would make Bagel 34B particularly unique, as no one else (AFAIK) is really finetuning Yi 200K on a long context.

And yeah, I would recommend unsloth in particular, its just a huge drop-in VRAM savings + speed boost with no downside, at least in my own testing.

brucethemoose changed discussion status to closed Jan 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment