Byte not found in vocab

by Shards86 - opened Nov 27, 2023

Nov 27, 2023

I've been trying to convert gguf to run it on Raspberry pi 5.

I have used two different approaches in llama.cpp conversion:

create vocab.json and merges.txt using hf Tokenizer > create extended tokenizer.model > convert gguf
use modified conversion script from this PR: https://github.com/ggerganov/llama.cpp/pull/3633 > create gguf out of only tokenizer.json

I have tried different quantizations (Q5_0, Q4_K_M, Q4_0). For some reason all approaches end to the same result: when I try to load the model, I get error 'Byte not found in vocab'. Do you have any idea, what this could be related? The original model is kind of working when using Transformers, but is is way too slow for RPi.

AiCreatornator

Nov 30, 2023

•

edited Nov 30, 2023

Hi, have you done that tokenizer.model ? Could you share it? I'm trying to make gguf-file too, but as far I understand it needs that tokenizer.model that has been removed from this repo. Or can you tell where is a quide to create it?

EDIT: I got the quantazion to work without tokenizer.model. I used that repo 3633 you linked. The problem was that I was trying to use it with llama.cpp in oogabooga, but it worked llamacpp_HF instead.

EDIT 2: Correction, it needs tokenizer.model to run it with llamacpp_HF. But it seems to work somehow with some faulty tokenizer.model in the same folder. Also important step was:

update the gguf filetype to current if older version is unsupported by another application

./quantize ./models/7B/ggml-model-q4_0.gguf ./models/7B/ggml-model-q4_0-v2.gguf COPY

Shards86

Nov 30, 2023

Cool, you actually got it working! I will have to try again, I am not sure if I did the version conversion you mentioned in the end.

FYI tokenizer.model conversion was done using instructions from here: https://github.com/huggingface/tokenizers/issues/521

Shards86

Dec 1, 2023

•

edited Dec 1, 2023

Yesterday I tested the conversions again with my laptop. I also installed oogabooga's webui to PC - it indeed works on Windows using llamacpp_HF both with GPU and CPU only configs.

I don't get it why the exactly same model won't work on Linux/aarch64. Shouldn't be memory issue, because other 3B models are working great.

Oh well, I will have to keep experimenting

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment