Unable to load tokenizer

#1
by ZhangRC - opened

I got this error:

Traceback (most recent call last):
  File "<redacted path>\ChatRWKV-main\v2\chat.py", line 117, in <module>
    pipeline = PIPELINE(model, f"<redacted path>/ChatRWKV-main/tokenizer/rwkv_vocab_v20230424.txt")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Program Files\Python311\Lib\site-packages\rwkv\utils.py", line 29, in __init__
    self.tokenizer = Tokenizer.from_file(WORD_NAME)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: invalid type: integer `1`, expected struct Tokenizer at line 1 column 1

It seems that the tokenizer is not compatible.

Windows 10, Python 3.11, PyTorch 2.0.0, RWKV 0.7.3, Tokenizers 0.13.3, CUDA 11.8

update RWKV pip package to 0.7.4

and pipeline = PIPELINE(model, "rwkv_vocab_v20230424")
(EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)

Bob: Hi
Alice:Traceback (most recent call last):
File "\ChatRWKV-main\v2\chat.py", line 457, in
on_message(msg)
File "\ChatRWKV-main\v2\chat.py", line 359, in on_message
token = pipeline.sample_logits(
^^^^^^^^^^^^^^^^^^^^^^^
File "\ChatRWKV-main\v2/../rwkv_pip_package/src\rwkv\utils.py", line 82, in sample_logits
out = torch.multinomial(probs, num_samples=1)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either inf, nan or element < 0

Changing device from 'cuda' to 'cpu' solves it. Might be a bug?

okay i forget to mention you need fp32 too, because here k will overflow in fp16 (fixable in future)

Sign up or log in to comment