Query on `vocab_size` in config.json for Inference

#2
by MatrixC7 - opened

Greetings!

Really appreciate the outstanding performance of this model – thank you for your hard work! I have a minor query regarding the vocab_size specified in config.json. Should it be set to 45440 instead of the current 45416 to reflect the actual size? Keeping 45416 would lead to an error while doing the inference with exllamav2 quantization as below:

ERROR: Traceback (most recent call last):
ERROR:   File "F:\tabbyAPI\main.py", line 460, in generator
ERROR:     for part, prompt_tokens, completion_tokens in new_generation:
ERROR:   File "F:\tabbyAPI\backends\exllamav2\model.py", line 741, in generate_gen
ERROR:     chunk, eos, tokens, _, _ = self.generator.stream()
ERROR:                                ^^^^^^^^^^^^^^^^^^^^^^^
ERROR:   File "C:\Users\i\scoop\apps\mambaforge\current\envs\tabbyapi-test\Lib\site-packages\exllamav2\generator\streaming.py", line 117, in stream
ERROR:     chunk, eos, chunk_token_ids, probs, logits = self._stream()
ERROR:                                                  ^^^^^^^^^^^^^^
ERROR:   File "C:\Users\i\scoop\apps\mambaforge\current\envs\tabbyapi-test\Lib\site-packages\exllamav2\generator\streaming.py", line 196, in _stream
ERROR:     self.held_logits = torch.cat([self.held_logits, next_logits], dim = 0)
ERROR:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR: RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 45416 but got size 45440 for tensor number 1 in the list.

Kind regards,
Fangru Shao

The problem comes from exllamav2 and @turboderp has fixed it! 🥳No need to change 45416 to make the quants work!

MatrixC7 changed discussion status to closed

Sign up or log in to comment