Over the past few days, SupraLabs has been mentioned in a public discussion regarding small language models, scaling laws, and training methodology. We'd like to clarify our position.
Before anything else, we want to make one thing absolutely clear: we have great respect for Lane and the work being done at Glint Research. At no point was our intention to disrespect Lane, Glint Research, or their research. What began as a technical discussion about model scaling and training methodology unfortunately became much more personal than we ever intended. From our perspective, it was simply an exchange of technical opinions, and we sincerely hope it remains that way. We'd also like to acknowledge that one of our own comments during the discussion was poorly worded. Referring to a benchmark as "fake" was imprecise. What we intended to criticize was the comparison methodology, not the integrity of the evaluation itself. Comparing a merged checkpoint against a single checkpoint is, in our view, not an apples-to-apples comparison.
That said, this was never the core of the discussion.
Our disagreement was not about SLERP, model merging, or whether training a small model on massive amounts of data is an interesting research direction. We support experimentation and unconventional ideas.
The actual point of disagreement was much simpler.
The statement that a 1M parameter model trained on 1 trillion tokens will become a "100M killer" is, today, a prediction, not an experimental result. Could it happen? Perhaps. Would it be exciting if it did? Absolutely.
But until benchmark results, reproducible evaluations, and independent validation exist, we believe such statements should be presented as hypotheses rather than established conclusions. Research advances by testing ideas, not by assuming their outcomes.
We sincerely wish Lane and everyone at Glint Research success in their experiments.
Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...
I got dreams, man. The datasets I could build with 40k would be insane. Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.
What would you do with it? I would turn arxiv into a dataset. Turn each arxiv paper into a QnA. Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.
Food for thought. Anyways, I think I'm going to make a post once a week. In the meantime you can find me building small llm's in discord here: https://discord.gg/4DdwS9D8x9
edit: So to be clear, I will not actually do this. But think of it this way. If you could pivot an entire industry with 10-30k would you do it?
7 replies
ยท
reactedtoCrownelius'spost with ๐ฅabout 2 months ago
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026 Que sera, what will he be?
Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.
Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.
Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.
Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.
Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.
The bank's after my credit card. Until then, full steam.
Next model gets graphs. I swear.
-Shane
3 replies
ยท
reactedtoCrownelius'spost with ๐ฅabout 2 months ago
My Huggingface journey has been a trip! I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.
Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.
Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.
The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.
I never knew the clanker hater from a year ago would be saying this.
Thank you all from the bottom of my heart.
Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!
3 replies
ยท
reactedtoEnderchef'spost with โค๏ธabout 2 months ago