pszemraj/long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

Jul 1, 2022

Hi Peter,

Thank for sharing this. Do you have any examples Colab / Notebook on how to set up this type of model for training using the trainer API? I tried to follow the same process that I used for Longformer but the training metrics were 0 almost of all the time.

Thank you so much.

pszemraj

Owner Jul 4, 2022

Hi Jorge,

Unfortunately, I don't have a notebook that I can immediately share (I use one for several different things with API tokens etc., in there); after I get that cleaned up-which might take a while, I am happy to share that.

That said, however, using Patrick Von Platen's LED notebook should work okay. The main difference between that and what I use is the addition of deepspeed, which you may want to try out! Btw, it is worth noting that in this size large model, I can only get to train with 16384 tokens input on an A100 GPU runtime.

Hope that helps!

Jorgeutd

Jul 4, 2022

This is definitely helpful. Thank you Peter.

pszemraj
/

long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

Training set up