I was wondering have you tried training Cerebras-GPT-1.3B until converge rather than just 1 epoch?

by ypxie0130 - opened Oct 7, 2023

Oct 7, 2023

Right now, almost all big companies are oddly, ubiquitously, unnaturally, only releasing models that are severely undertrained. They seems very reluctant to train the smaller model for LONGER to make it actually work.

If we have a fully trained smaller model, that would be super helpful and truly democratize the LLM research and application.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment