future experts?

#3
by NickyNicky - opened

same title

Assuming your referring to a Mixture of Experts, it might be better if we use a few more diverse models. The Hercules series has a focus on roleplay and function calling, while still retaining decent performance in different domains. I do plan releasing a new model lineup, as I don't think I'll expand any more on Hercules. I'd like to hear your ideas for such a model lineup

It is really interesting to expand the 7b Hercules models and convert them into 'Mixture of Experts'.

Another idea would be to use 1B mistral models, segment the dataset with different abilities and train models to convert them into 'Mixture of Experts', reaching more than 32 Experts of 1B parameters.

multilanguage model.

@Locutusque the one thing I will say is attempting sparsetral with this dataset may be quite interesting..

@Locutusque the one thing I will say is attempting sparsetral with this dataset may be quite interesting..

I would if I could, but I don't think I have enough computational a large model like that one... It's like 16x7 right?

it's 16x7 yes but it's a misleading name, cause it's actually a 7b model with 16 "experts" or adapter, so you're just training a bunch of small routers AND the base model, rather than 16 copies of the base model

Hmm, I see. It's only around 10 billion parameters, which I certainly have enough computational resources for. However, my quota's up, and I have plans this week for other models. This one might have to wait for about ~2 weeks.

What's the question @NickyNicky ?

Hercules with this new updated dataset

Sign up or log in to comment