I was wondering have you tried training Cerebras-GPT-1.3B until converge rather than just 1 epoch?

#2
by ypxie0130 - opened

Right now, almost all big companies are oddly, ubiquitously, unnaturally, only releasing models that are severely undertrained. They seems very reluctant to train the smaller model for LONGER to make it actually work.

If we have a fully trained smaller model, that would be super helpful and truly democratize the LLM research and application.

Sign up or log in to comment