How do I try this out?

#6
by henke443 - opened

I tried to deploy it using gradle but it is infinitely loading and doesn't seem to work, neither does the other gradle endpoints that other people have made.

I want to host it in an (huggingface) inference api preferably, which I managed to get working for other models but I get an error when trying to run this.

I think this is the most relevant part of the error:

tokenizer = LlamaTokenizerFast.from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained\n return cls._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1886, in _from_pretrained\n slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2073, in _from_pretrained\n raise ValueError(\n\nValueError: Non-consecutive added token '' found. Should have index 32000 but has index 0 in saved vocabulary.\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/10/15 12:24:56 ~ Error: ShardCannotStart

It said "Non-consecutive added token ' < u n k > ' found" but it seems like html escaping removed it.

Cognitive Computations org

I don't know
It works in oobabooga for me
@TheBloke do you recognize that error message?

Remove these lines from added_tokens.json

  "</s>": 2,
  "<s>": 1,
  "<unk>": 0,

The link above says to delete the file but it is important for the chatml format

Sign up or log in to comment