Training

#3
by NePe - opened

You can train models on kaggle with TPU. I'm currently experimenting with neox-125m and it takes about 7h for 2 epoch on a 1.7GB dataset (~715M tokens) with a batch size of 80*2048 tokens.

Sign up or log in to comment