Continue pretraining of some larger models?

#2
by KnutJaegersberg - opened

A twitter contact suggested to me it could make sense to continue pretraining of one of the larger models, i.e. mpt-30b or falcon-40b on some German data.
What do you think about this?
Do you have ideas how to realize that? Perhaps continuing pretraining for some 50b tokens would cost 100k euros or so.

I am already preparing the 70b model training on large scale german data :-)
Stay tuned

If you want to speed up the process we could speak about some ways to help financing that, my Employer is the defacto main sponsor of training and inference hardware in their data centers.
So one way would be telling them my name when buying new hardware or directly contact them to find a way of financial support.
Pinging @jphme who was also interested in german LLM research group
Maybe we could open an slack or something

https://join.slack.com/t/slack-dtc7771/shared_invite/zt-219keplqu-hLwjm0xcFAOX7enERfBz0Q

Just created an slack for german llm's, would be happy plan more training runs there

KnutJaegersberg changed discussion status to closed

Sign up or log in to comment