~236M decoder, only SelectionMixing (order-statistic routing), no attention. Muon, bf16, trained on MBPP+Alpaca+Evol-Instruct. step=6000.
Files info