Pinkstack
/

Superthoughts-mini-v1-3.8b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Pinkstack commited on 1 day ago

Commit

98dac28

·

verified ·

1 Parent(s): cdc8df4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ Advanced, high-quality and an easy to run reasoning for a small size that you ca
 At its original quality, it runs at ~250 tokens/second on a single Friendli H100 Nvidia GPU.
-Trained similarly to Deepseek R1, we used Phi-3.5-mini as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems & summerization. after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities.
 Unlike the LITE version of superthoughts, we've fixed a few issues in our instruction/sft dataset, and changed the GRPO code so the model would be more significantly conversational (eg; if you ask it "Whats the weather like?" the lite version would think and probably just answer with "good", unlike superthoughts mini which would start an actual conversation). This model has very strong reasoning abillities yet does not over-think. We personally have found that at least in GSM8K, over thinking did not help much, rather it just made the model get confused and waste a lot of tokens. so due to this, superthoughts mini does not usually over-think.
 # Format & Examples

 At its original quality, it runs at ~250 tokens/second on a single Friendli H100 Nvidia GPU.
+Trained similarly to Deepseek R1, we used Phi-3.5-mini (and it uses the same chat template as phi-3) as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems & summerization. after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities.
 Unlike the LITE version of superthoughts, we've fixed a few issues in our instruction/sft dataset, and changed the GRPO code so the model would be more significantly conversational (eg; if you ask it "Whats the weather like?" the lite version would think and probably just answer with "good", unlike superthoughts mini which would start an actual conversation). This model has very strong reasoning abillities yet does not over-think. We personally have found that at least in GSM8K, over thinking did not help much, rather it just made the model get confused and waste a lot of tokens. so due to this, superthoughts mini does not usually over-think.
 # Format & Examples