Error during quantization

#1
by Gryphe - opened

Just a FYI (I'm aware you made a GGML available yourself)

Exception: Vocab size mismatch (model has 32032, but I:\HF\Storage\NousResearch_Nous-Hermes-Llama2-13b\tokenizer.model combined with I:\HF\Storage\NousResearch_Nous-Hermes-Llama2-13b\added_tokens.json has 32001).

Gryphe changed discussion title from Errors during quantization to Error during quantization

Same finding here.

Also when I attempted quantization from the provided ggml fp16, I'm getting notified that certain tensors aren't k-quant compatible due to dimensions not being a multiple of 256 - presumably also related to the vocab changes.

Yup, doesn't seem to work with 4 bit or 8 bit quantization offered through bitsandbytes

BnB on newer transformers can be fixed with pretraining_tp": 1 in the config file

Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000

Does anyone know how to solve this?

Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000

Does anyone know how to solve this?

You can add 32 dummy tokens to added_tokens.json to make it match the tensor size. Not sure the reason it's set up like this.

NousResearch org

BnB on newer transformers can be fixed with pretraining_tp": 1 in the config file

this is the real fix. its an issue on behalf of huggingface and broke lots of the llama 2 finetunes dropped that day.

fix has been pushed on the model if you wanna just download the new config.json

if still having issues can do the dummy token thing but not recommended

I upgraded transformers and bitsandbytes to latest versions, but I am still getting vocab size mismatch when trying to run convert.py in llama.cpp. What am I missing?

Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.

Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.

Please tell me .How do I add a bunch of dummy tokens?

Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.

Please tell me .How do I add a bunch of dummy tokens?

this is my added_tokens.json file with dummy tokens to make it total of 32032 tokens:

{"<pad>": 32000, "<pad1>": 32001, "<pad2>": 32002, "<pad3>": 32003, "<pad4>": 32004, "<pad5>": 32005, "<pad6>": 32006, "<pad7>": 32007, "<pad8>": 32008, "<pad9>": 32009, "<pad10>": 32010, "<pad11>": 32011, "<pad12>": 32012, "<pad13>": 32013, "<pad14>": 32014, "<pad15>": 32015, "<pad16>": 32016, "<pad17>": 32017, "<pad18>": 32018, "<pad19>": 32019, "<pad20>": 32020, "<pad21>": 32021, "<pad22>": 32022, "<pad23>": 32023, "<pad24>": 32024, "<pad25>": 32025, "<pad26>": 32026, "<pad27>": 32027, "<pad28>": 32028, "<pad29>": 32029, "<pad30>": 32030,"<pad31>": 32031}

NousResearch org

Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000

Does anyone know how to solve this?

You can add 32 dummy tokens to added_tokens.json to make it match the tensor size. Not sure the reason it's set up like this.

Seems it was the trainer we used, axolotl. It has been fixed in the trainer but still dont know how to fix it here

teknium changed discussion status to closed

Python script to generate valid tokenizer.model:


from pathlib import Path
from datasets import load_dataset
from transformers import AutoTokenizer

tokenizer_model_name = 'NousResearch/Llama-2-7b-hf'
model_path = 'output'
new_tokens = [f"<pad{i}>" for i in range(31)]

tokenizer = AutoTokenizer.from_pretrained(tokenizer_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer.add_tokens(new_tokens)

tokenizer.save_pretrained(Path(model_path))
tokenizer.save_vocabulary(model_path)

Sign up or log in to comment