Training set up

#1
by Jorgeutd - opened

Hi Peter,

Thank for sharing this. Do you have any examples Colab / Notebook on how to set up this type of model for training using the trainer API? I tried to follow the same process that I used for Longformer but the training metrics were 0 almost of all the time.

Thank you so much.

Hi Jorge,

Unfortunately, I don't have a notebook that I can immediately share (I use one for several different things with API tokens etc., in there); after I get that cleaned up-which might take a while, I am happy to share that.

That said, however, using Patrick Von Platen's LED notebook should work okay. The main difference between that and what I use is the addition of deepspeed, which you may want to try out! Btw, it is worth noting that in this size large model, I can only get to train with 16384 tokens input on an A100 GPU runtime.

Hope that helps!

This is definitely helpful. Thank you Peter.

Sign up or log in to comment