Which tokenizer to use?

by EZForever - opened Mar 30, 2024

Mar 30, 2024

I've been trying to run this model with llama.cpp, but I've noticed that this repository is missing the tokenizer model. Since Mistral-7B-v0.2 is somehow gone, I have tried tokenizer models from Mistral-7B-v0.1 and Mistral-7B-Instruct-v0.2, but both were not working properly, with lots of missing tokens littering the result.

Is the tokenizer yet to be uploaded, or should I use another one? I'm a complete newbie to this, so please bear with me if it's a stupid question.

terrencefm

OpenBuddy org Mar 30, 2024

Try to download the tokenizer files from this model:
https://huggingface.co/OpenBuddy/openbuddy-mistral2-7b-v20.2-32k

EZForever

Apr 1, 2024

Thanks, can confirm the tokenizer from v20.2 works perfectly.

EZForever changed discussion status to closed Apr 1, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment