Could you please share the initial weights of one of the experts from jamba?

#4
by danielpark - opened

I'm unable to load the large weights from jamba. It's almost impossible to grab an A100 with 80GB in Google Colab or to grab multiple GPUs. Therefore, I came across your fantastic repo while looking for someone who could split one of jamba's experts and share it as initial weights.

Pretraining is not necessary. Would you be able to share the initial weights of one of jamba's experts? Thank you.

Hi @danielpark , it is not easy to split an expert to convert it into a dense model like mixtral. Jamba's moe implementation is slightly different. Simply splitting an expert will only lead to garbled output.
I'll try to see if there are any other merging strategies.

Thank you sincerely for the swift and impressive work as well as providing an open script. Although not many people are aware of this fantastic work yet, it will undoubtedly be very useful. Thank you!

Sign up or log in to comment