Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints

Gibberish with exllamav2 but working fine with exllamav2_HF

#1
by nuke3d - opened

I tried loading this from code with exllamav2, the same way I load other models, but it only generates gibberish. I then tried in text generation webui with exllamav2 and it's the same issue. When I use exllamav2_HF instead, it generates fine. How can I make this model work from my own code when using exllamav2?

It's going to be down to the prompt being used. Turn on verbose mode with ooba text gen and see what is being sent to the model. Then use the notebook in ooba to try to replicate the prompt. You will also need to add special characters like <s> and </s> and any other characters in the model's config. It's all a bit painful unfortunately. ooba and other utils can read the prompt format if available, but not all tools support automatic prompt reading.

Even with the wrong prompt the exllamav2_HF version works fine:

### Instruction
say Hello

### Response
Hello, how can I assist you today?

But exllamav2 doesn't:

### Instruction
say Hello

### Response
turnperty
		@?=ngBтон
ilt sheunctionknww'=>S

I also tried the correct prompt but it doesn't make any difference, at least in terms of returning any kind of sensible text. I tried with exllamav2 0.0.10 and 0.0.11. Must be something related to the tokenizer for this model. Other models work fine, and as far as I can see the only diff between exllamav2 and exllamav2_HF is the tokenizer part.

Check for the special tokens as well like I mentioned previously, e.g. <s> and </s> and any start/end token.

This is in the notebook so there are no special tokens in either case, all other settings are exactly the same, except for using _HF in the first case.

I also tried LoneStriker/Magicoder-S-CL-7B-8.0bpw-h8-exl2-2 just now, this works fine with both exllamav2 and exllamav2_HF.

Sign up or log in to comment