Infinite recursion unless we provide some special tokens
#15
by
pevogam
- opened
Hi, the model has worked greatly before but I think a recent version of the transformers
library has a problem (possibly with auto-gptq) that brings to infinite recursion:
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1229, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 297, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 304, in _convert_token_to_id_with_added_voc
return self.unk_token_id
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1229, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 297, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 304, in _convert_token_to_id_with_added_voc
return self.unk_token_id
^^^^^^^^^^^^^^^^^
It can be fixed if I add the following arguments:
quantized_model_dir = "/app/models/TheBloke_OpenAssistant-SFT-7-Llama-30B-GPTQ_gptq-4bit-32g-actorder_True"
-tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True)
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=True,
+ unk_token="<unk>", bos_token="<s>", eos_token="</s>")
I am using the most recent version of all dependencies including auto-gptq:
root@dfe9ff3a5c15:/app# pip show auto-gptq
Name: auto-gptq
Version: 0.4.2
Summary: An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Home-page: https://github.com/PanQiWei/AutoGPTQ
Author: PanQiWei
Author-email:
License:
Location: /usr/local/lib/python3.11/site-packages
Requires: accelerate, datasets, numpy, peft, rouge, safetensors, torch, transformers
Required-by:
so I doubt it is caused by a choice of wrong version.
Ah yeah, this is a result of a Transformers change - it no longer accepts empty values in special_tokens_map.json
.
I just applied a fix, please re-download and test again.
Alright, I applied the same change as in the diff of your commit and it works now, thanks and closing this.
pevogam
changed discussion status to
closed