I think this is actually just 0.1

#1
by bartowski - opened

According to the git repo which includes a link to this model's tar download, they added earlier today:

  • Important:
    • mixtral-8x22B-Instruct-v0.3.tar is exactly the same as Mixtral-8x22B-Instruct-v0.1, only stored in .safetensors format
    • mixtral-8x22B-v0.3.tar is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.

https://github.com/mistralai/mistral-inference?tab=readme-ov-file#model-download

Mistral Community org

mixtral-8x22B-v0.3.tar is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32756 tokens.

Then it isn't the same, is it?

for the non-instruct, yes

but for instruct, it's completely identical, the original 0.1 model on HF has 32756 tokens in the vocab

Mistral Community org

yes that's true. But, Mistral gets to decide, if they make a v0.3 that's exactly the same as v0.1. I think we are more like the historians that record such things. Not creators. this v0.3 matches theirs, which also happens to be the same as v0.1. I think it does no harm, and it does some good.

Sign up or log in to comment