tokenizer.bos_token_id is None

#20
by yourui - opened

看起来tokenizer.bos_token_id is None

  File "ptuning/main.py", line 219, in preprocess_function_train
    context_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list
yourui changed discussion title from `tokenizer.bos_token_id` is `None` to tokenizer.bos_token_id is None
Python 3.9.16 (main, Mar  8 2023, 04:29:44) 
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
Downloading tokenizer.model: 100%|████████████████████████████████████████████| 1.02M/1.02M [00:02<00:00, 355kB/s]
>>> print(tokenizer.bos_token_id)
None

I have the same issue

遇到了同样的问题

将 bos_token_id 改为eos_token_id可解决该问题

将 bos_token_id 改为eos_token_id可解决该问题

这样改有问题,你没发现吗,你这样改之后所有的id全部变成-100了,这样训练的模型有问题,但是我现在不懂怎么改

看起来tokenizer.bos_token_id is None

  File "ptuning/main.py", line 219, in preprocess_function_train
    context_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list

这个问题你解决了吗,我也碰到了这样的问题

真实原因是人家有chat-glm2对应的github代码微调库,你用的chat-glm1的

真实原因是人家有chat-glm2对应的github代码微调库,你用的chat-glm1的

懂了,那我直接chat-glm2对应的微调代码库是吧,有连接吗,谢谢

找到了,谢谢

yourui changed discussion status to closed

Sign up or log in to comment