thank u

#24
by StarpowerTechnology - opened

i knew this day would come that someone would prove all it takes is the right build i wana cry .. this shii beautiful bro

WeiboAI org

Appreciate it! Let’s keep pushing to make large models cheaper and accessible to everyone.

i have been doing some research bro .. i am convinced that a models trained token count doesnt matter as long as it covers most communication .. i know some qwen3 models are 36 trillion tokens but ranges from 0.5b - 500m+ parameters .. which mean the model can be a smaller overall size by using a smaller token count .. the chinchilla model was supposedly 70b parameter w 1.7t tokens making a better ratio for connectivity/token count

with that being said i think if u try this on a model architecture that has more connectivity but with less tokens u can get a better performance .. thi is only from my own speculations though. havent proven this to be the case on my own experiments

Yes, this is a great project.

i am convinced that a models trained token count doesnt matter as long as it covers most communication

Actually, the literature consistently reports that higher diverse token count often leads to a better model. So a smaller model trained on more data than another model of the same size may perform better. But that is only if both data mixtures are of high quality. Otherwise, less data could outperform the noise.

But the important part is that smaller itself (less params) doesn't always mean worse performing, and that's what VibeThinker helps prove.

You may be interested in this blog post on how the tiny Falcon reasoning models were made "mighty": https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost

The key points over the last few years of research:

  • A little good data outperforms lots of bad data, but only minimize if you're actually filtering out the bad stuff.
  • The principle training mode is not SFT, and CPT plus RL are extremely important for real learning.
  • Small models are awesome when they're stuck to a single or a small selection of tasks they can learn verifiably.

Sign up or log in to comment