Post
1849
hi everyone!
i wanted to share an experiment i did with upcycling phi-3 mini into an moe recently.
while benchmarks are definitely within a margin of error and they performed similarly, i think it's an interesting base to try and see if you can improve phi's performance! (maybe looking into HuggingFaceFW/fineweb-edu could be interesting, i also left some other notes if anyone with more compute access wants to try it themselves)
check it out! Fizzarolli/phi3-4x4b-v1
i wanted to share an experiment i did with upcycling phi-3 mini into an moe recently.
while benchmarks are definitely within a margin of error and they performed similarly, i think it's an interesting base to try and see if you can improve phi's performance! (maybe looking into HuggingFaceFW/fineweb-edu could be interesting, i also left some other notes if anyone with more compute access wants to try it themselves)
check it out! Fizzarolli/phi3-4x4b-v1