isemmanuelolowe/Jamba-8xMoE_Slerp

Mar 30, 2024

Awesome work! Thanks for making these! Was wondering if you'd had a chance to test and of the SLERPs yet and see what the outputs are like?

isemmanuelolowe

Owner Mar 30, 2024

•

edited Apr 7, 2024

Thanks! These models which I used an accumulative slerp method show extremely degraded outputs. They can generate valid English sentences but exhibit infinite loop behavior. I am currently training the expert layers to see if some performance can be recovered. I will put up relevant code and stats soon.

Edit:
Performance is not as degraded as I thought with addition of repetition_penalty. Finetuning shows good results and decreases repetition_penality required. Chat Adapter available for the Jamba-4xMoe_slerp. But still no evaluation on any benchmarks as of yet.

Severian

Apr 7, 2024

•

edited Apr 7, 2024

This is so awesome! Apologies for missing your first reply.

Thanks for experimenting and crafting these with some test outputs. I'm going to download the 4x version and run it through some trainings and see what happens

I'll share the results and hopefully subsequent models! You are a legend for making these 💪

isemmanuelolowe
/

Jamba-8xMoE_Slerp

Test outputs?