Infinite recursion unless we provide some special tokens

#15
by pevogam - opened

Hi, the model has worked greatly before but I think a recent version of the transformers library has a problem (possibly with auto-gptq) that brings to infinite recursion:

  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1229, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 297, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 304, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1229, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 297, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 304, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
           ^^^^^^^^^^^^^^^^^

It can be fixed if I add the following arguments:

quantized_model_dir = "/app/models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ_gptq-4bit-32g-actorder_True"
-tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True,
+                                          unk_token="<unk>", bos_token="<s>", eos_token="</s>")

I am using the most recent version of all dependencies including auto-gptq:

root@dfe9ff3a5c15:/app# pip show auto-gptq
Name: auto-gptq
Version: 0.4.2
Summary: An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Home-page: https://github.com/PanQiWei/AutoGPTQ
Author: PanQiWei
Author-email: 
License: 
Location: /usr/local/lib/python3.11/site-packages
Requires: accelerate, datasets, numpy, peft, rouge, safetensors, torch, transformers
Required-by: 

so I doubt it is caused by a choice of wrong version.

Ah yeah, this is a result of a Transformers change - it no longer accepts empty values in special_tokens_map.json.

I just applied a fix, please re-download and test again.

Alright, I applied the same change as in the diff of your commit and it works now, thanks and closing this.

pevogam changed discussion status to closed

Sign up or log in to comment