Finetune on longer sequences

#1
by joelniklaus - opened

Hi guys,
Great model suite, thanks a lot!
I am interested in finetuning this model on longer sequences (4096 to 8192). What would you say is the easiest way to do this?
Cheers,
Joel

I have the same question

it's weird, but on https://moon-ci-docs.huggingface.co/docs/transformers/pr_22810/en/model_doc/gpt_neox_alibi (and only there) there's a GPT-NeoX alibi, might be a pull request that never got merged.

Pythia should be the same family as NeoX, happy to look at other solutions but basically looking for the same thing.

Sign up or log in to comment