I think this is actually just 0.1

by bartowski - opened May 23, 2024

Discussion

bartowski

May 23, 2024

According to the git repo which includes a link to this model's tar download, they added earlier today:

Important:
- mixtral-8x22B-Instruct-v0.3.tar is exactly the same as Mixtral-8x22B-Instruct-v0.1, only stored in .safetensors format
- mixtral-8x22B-v0.3.tar is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.

https://github.com/mistralai/mistral-inference?tab=readme-ov-file#model-download

ehartford

Unofficial Mistral Community org May 23, 2024

mixtral-8x22B-v0.3.tar is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.

Then it isn't the same, is it?

bartowski

May 24, 2024

for the non-instruct, yes

but for instruct, it's completely identical, the original 0.1 model on HF has 32756 tokens in the vocab

ehartford

Unofficial Mistral Community org May 26, 2024

yes that's true. But, Mistral gets to decide, if they make a v0.3 that's exactly the same as v0.1. I think we are more like the historians that record such things. Not creators. this v0.3 matches theirs, which also happens to be the same as v0.1. I think it does no harm, and it does some good.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment