ct2 converter command raises vocabular size error

#3
by nazimali - opened

When I run the ct2 converter command mentioned in the README, it results in a vocabulary size error. Since you ran the converter command on 2023-06-06, I checked the original model commit history and there hasn't been new commits since 2023-06-06. Can anyone reproduce this or have any insights as to what causes it? Thanks in advance.

ct2-transformers-converter --model OpenAssistant/falcon-7b-sft-top1-696 --output_dir ~/tmp-ct2fast-falcon-7b-sft-top1-696 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code

Traceback:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/llm/bin/ct2-transformers-converter", line 8, in <module>
    sys.exit(main())
  File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/transformers.py", line 1577, in main
    converter.convert_from_args(args)
  File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
    return self.convert(
  File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 97, in convert
    model_spec.validate()
  File "/home/user/.pyenv/versions/3.10.11/envs/llm/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 561, in validate
    raise ValueError(
ValueError: Vocabulary has size 65029 but the model expected a vocabulary of size 65040

Library versions:

ctranslate2               3.16.0
hf-hub-ctranslate2        2.12.0
transformers              4.30.2
torch                     2.0.1

You have to wait for ctranslate2 = 3.17.0 or build from source. See: https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1275

michaelfeil changed discussion status to closed

Sign up or log in to comment