Cost of pretraining?

#1
by KnutJaegersberg - opened

How much did it cost to pretrain these models at different model sizes for the size of the corpus you used?
I'm interested in learning pretraining myself, I think you might have one of the most efficient models out there.
I guess, designing the corpus to match later tasks matters way more than corpus size (with a reasonably sized corpus) for performance.
Might even be more effective than fine tuning.

The fact that T5 pretrained on realnews (40gb) performed almost as good as T5 pretrained on all (almost 800GB) awakens my interest in this matter.

Sign up or log in to comment