Vocab size mismatch with ggml

#9
by 0xK1ller - opened

Trying to convert the model to ggml results in the following exception:

  File "D:\Games\huggingface\llama.cpp\convert.py", line 1149, in <module>
    main()
  File "D:\Games\huggingface\llama.cpp\convert.py", line 1144, in main
    OutputFile.write_all(outfile, params, model, vocab)
  File "D:\Games\huggingface\llama.cpp\convert.py", line 942, in write_all
    check_vocab_size(params, vocab)
  File "D:\Games\huggingface\llama.cpp\convert.py", line 896, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 32001, but D:\Games\huggingface\huggingface\models\vicuna-7b\tokenizer.model has 32000).  Most likely you are missing added_tokens.json (should be in D:\Games\huggingface\huggingface\models\vicuna-7b).```

I verified all checksums and also I'm using the latest commit of llama.cpp. Does someone know how to resolve the issue?

Edit: Tried it on google colab in case there was something wrong with the environment but I get the same error.

Sign up or log in to comment