Merged a 103B version

#3
by FluffyKaeloky - opened

Hi !
I've been using your Midnight Miqu 103B v1.0 and found it really good ! I took the liberty of creating a 103B version of your V1.5 70B model. I used the same merge parameters than your own v1.0 103B model.

https://huggingface.co/FluffyKaeloky/Midnight-Miqu-103B-v1.5

This is the first merge I do, other than quantisations, so hopefully I did not do anything wrong. I've tested it quite a lot with 32K context with no issues. I am currently uploading EXL2 quants as well.

May I ask, also, why 103B ? Since the model has 120 layers ? This gets me a bit confused.

Thanks for your amazing work ! :D

Your merge config looks good for it. You beat me to it! I added your model to my collections for Midnight Miqu so hopefully people see it.

The number of parameters isn't quite equal to the number of layers. For example, a 70B parameter model has 80 layers and a 120B parameter model has 140 layers. 103B just happens to be the approximate number of parameters for 120 layers. I think it's a nice in between point.

Ah, I see ! Thanks for the clarification ! I'm also glad you could add it to your collection !

Fun fact, I had the files on my computer sitting for about three days, complete with exl2 quants, but my internet connection decided to die right when I was done. I'm glad to finally be able to upload all that ! I'll send you a message next time :)

Sign up or log in to comment