Use V1 tokenizer instead

#10

by Rocketknight1 HF staff - opened Jul 9, 2024

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

+32001

-45824

Rocketknight1

Jul 9, 2024

No description provided.

Upload tokenizerc356b812

Rocketknight1 changed pull request title from Upload tokenizer to Use V1 tokenizer instead Jul 9, 2024

Rocketknight1

Jul 9, 2024

There was an issue with the last PR - we used the V3 tokenizer, but this base model actually uses the V1 tokenizer. This should fix the issue!

lbathen

Jul 9, 2024

@Rocketknight1 does it affect the vocab size? Model and tokenizer sizes are not matching. So model is failing to load.

Rocketknight1

Jul 10, 2024

@lbathen can you give me some code to reproduce that issue? From here it looks like the tokenizer and the model both have a vocab size of 32000

lbathen

Jul 10, 2024

@Rocketknight1 I confirmed that both show same vocab of 32K now. I had pulled the wrong revision :)

wuliang-google

Jul 26, 2024

Is this going to be merged soon?

jdpressman

Aug 21, 2024

•

edited Aug 21, 2024

@Rocketknight1 Could you merge this in? It's working on my end and I'm thankful to have this model back.

This command should get it running for anyone who needs it:

python -m vllm.entrypoints.openai.api_server --model mistralai/Mixtral-8x22B-v0.1 --revision c356b81 --served-model-name mistralai/Mixtral-8x22B-v0.1 --max-logprobs 100 --gpu-memory-utilization=0.85 --disable-log-requests --disable-log-stats --port 5001 --tensor-parallel-size 8

pandora-s

Mistral AI_ org Aug 21, 2024

My apologies! Merging this PR!

pandora-s changed pull request status to closed Aug 21, 2024

pandora-s changed pull request status to open Aug 21, 2024

pandora-s changed pull request status to merged Aug 21, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment