AttributeError: 'YayiTokenizer' object has no attribute 'sp_model'

#4
by Yhyu13 - opened

Hi,

I tried to load Yiyi2 with transformer 4.36.2, and here is an error with tokenizer class

β”‚ /home/hangyu5/Documents/Git-repoMy/text-generation-webui/server.py:241 in <module>                                                                                                                                                                                                                                  β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   240         # Load the model                                                                                                                                                                                                                                                                                      β”‚
β”‚ ❱ 241         shared.model, shared.tokenizer = load_model(model_name)                                                                                                                                                                                                                                               β”‚
β”‚   242         if shared.args.lora:                                                                                                                                                                                                                                                                                  β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/Documents/Git-repoMy/text-generation-webui/modules/models.py:98 in load_model                                                                                                                                                                                                                         β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚    97         else:                                                                                                                                                                                                                                                                                                 β”‚
β”‚ ❱  98             tokenizer = load_tokenizer(model_name, model)                                                                                                                                                                                                                                                     β”‚
β”‚    99                                                                                                                                                                                                                                                                                                               β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/Documents/Git-repoMy/text-generation-webui/modules/models.py:126 in load_tokenizer                                                                                                                                                                                                                    β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   125                                                                                                                                                                                                                                                                                                               β”‚
β”‚ ❱ 126         tokenizer = AutoTokenizer.from_pretrained(                                                                                                                                                                                                                                                            β”‚
β”‚   127             path_to_model,                                                                                                                                                                                                                                                                                    β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py:774 in from_pretrained                                                                                                                                                                              β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   773                 tokenizer_class.register_for_auto_class()                                                                                                                                                                                                                                                     β”‚
β”‚ ❱ 774             return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input                                                                                                                                                                                                                      β”‚
β”‚   775         elif config_tokenizer_class is not None:                                                                                                                                                                                                                                                              β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2028 in from_pretrained                                                                                                                                                                                   β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   2027                                                                                                                                                                                                                                                                                                              β”‚
β”‚ ❱ 2028         return cls._from_pretrained(                                                                                                                                                                                                                                                                         β”‚
β”‚   2029             resolved_vocab_files,                                                                                                                                                                                                                                                                            β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚                                                                                                                                               ... 1 frames hidden ...                                                                                                                                               β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py:74 in __init__                                                                                                                                                                                                         β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚    73         pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_                                                                                                                                                                                                                      β”‚
β”‚ ❱  74         super().__init__(                                                                                                                                                                                                                                                                                     β”‚
β”‚    75             bos_token=bos_token,                                                                                                                                                                                                                                                                              β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/tokenization_utils.py:367 in __init__                                                                                                                                                                                                β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚    366         # the order of addition is the same as self.SPECIAL_TOKENS_ATTRIBUTES following                                                                                                                                                                                                                      β”‚
β”‚ ❱  367         self._add_tokens(                                                                                                                                                                                                                                                                                    β”‚
β”‚    368             [token for token in self.all_special_tokens_extended if token not in self._a                                                                                                                                                                                                                     β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/tokenization_utils.py:467 in _add_tokens                                                                                                                                                                                             β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚    466         # TODO this is fairly slow to improve!                                                                                                                                                                                                                                                               β”‚
β”‚ ❱  467         current_vocab = self.get_vocab().copy()                                                                                                                                                                                                                                                              β”‚
β”‚    468         new_idx = len(current_vocab)  # only call this once, len gives the last index +                                                                                                                                                                                                                      β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py:111 in get_vocab                                                                                                                                                                                                       β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   110         """Returns vocab as a dict"""                                                                                                                                                                                                                                                                         β”‚
β”‚ ❱ 111         vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}                                                                                                                                                                                                                            β”‚
β”‚   112         vocab.update(self.added_tokens_encoder)                                                                                                                                                                                                                                                               β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚ /home/hangyu5/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py:107 in vocab_size                                                                                                                                                                                                      β”‚
β”‚                                                                                                                                                                                                                                                                                                                     β”‚
β”‚   106         """Returns vocab size"""                                                                                                                                                                                                                                                                              β”‚
β”‚ ❱ 107         return self.sp_model.get_piece_size()                                                                                                                                                                                                                                                                 β”‚
β”‚   108                                                                                                                                                                                                                                                                                                               β”‚
╰─────────────────────────────────────────────────
AttributeError: 'YayiTokenizer' object has no attribute 'sp_model'
Yhyu13 changed discussion title from Tokenizer does not have sp_model to AttributeError: 'YayiTokenizer' object has no attribute 'sp_model'

Also, hf flash attan2 does not support this model yet. Would you like to update this model to be part of hf transformer?

Thank you for your attention to YAYI2! In order to avoid version conflicts, it is recommended to follow the transformer version in config.json. You can also refer to the solutions provided by mzbac in discussion #5.

We will adapt to the latest transformer version as soon as possible.

wenge-research changed discussion status to closed

Sign up or log in to comment