Can you make a mamba version

#4
by LaferriereJC - opened

I'd love to see a 1.3b. Honestly a 1.3b x8 mixtral

Cognitive Computations org

Mamba is not yet fully trained.

I'll wait until Mamba 2 comes out.

In the mean time, striped hyena I could do.

all the ones released by state-spaces are pretrained
'Pretrained models are uploaded to Hugging Face: mamba-130m, mamba-370m, mamba-790m, mamba-1.4b, mamba-2.8b, trained on 300B tokens on the Pile, as well as mamba-2.8b-slimpj (trained on 600B tokens on the SlimPajama dataset).'
sauce: https://github.com/state-spaces/mamba

https://huggingface.co/state-spaces/mamba-2.8b-slimpj
https://huggingface.co/clibrain/mamba-2.8b-instruct-openhermes

Cognitive Computations org

compared to a fully trained model of 3-5T tokens.

I'm waiting for a fully trained mamba.

ehartford changed discussion status to closed

Sign up or log in to comment