Tokenizer of Yi Dare v5

#4

Required once again, as it was in the v7.
And it works.

brucethemoose changed pull request status to merged

Thanks.

Note that the tokenizer from the v5 merge is just a copy of Yi's tokenizer, IIRC. This was before I was even aware of mergekit's union tokenizer merge.

It might not work quite right, as some tokens (like the ChatML tokens) are missing.

Well, you integrated Bagel, and it works on the same tokenizer despite all the prompt formats it offers.

Here's Jon's views on ChatML :

"ChatML (sort of)

I don't really understand the point of having special tokens for <|im_start|> and <|im_end|>, because in practice they just act as BOS and EOS tokens (but, please correct me if I'm wrong).

So, instead of:

{bos}<|im_start|>{role}
{text}
<|im_end|>{eos}

I just changed it to:

{bos}{role}
{text}
{eos}

If you really want to use <|im_start|> and <|im_end|>, just update your tokenizer_config.json to use <|im_start|> instead of and <|im_end|> instead of and when tokenizing. And if you still don't like what I've done to this chat-ml-ish format, feel free to cry into your pillow or fork the code and do a new fine-tune."

https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2#chatml-sort-of

Screenshot 2024-01-16 at 03-57-23 Text generation web UI.png

Also, a few tests I made. Is the calibration that you make in your quant possibly affecting so much the wikitext and ptb perplexity?

Hmmm, you are not the first person to report something off with the quantization. I test all the merges myself in ooba with 4-bit bitsandbytes, and the perplexity of the raw weights is good.

I ran exllamav2's own perplexity test in wikitext just to rule out ooba:

v8-exl2-4bpw-fiction: 6.2723

v8-exl2-31pw-fiction: 8203.8706

v8-exl2-26pw-fiction: 77592.0066

v7-exl2-31bpw-fiction: 9097.3480

Oof. Yeah something is wrong with the lower quants, possibly all of them.

Here's perplexity on the actual .parquet file I quantized with:

v8-exl2-31pw-fiction: 14.1167

v8-exl2-26pw-fiction: 21.5868

Still catastrophic, albeit not hilariously catastrophic like wikitext.

I guess I will take the 3.1bpw and lower quants down? TBH I didn't really notice they were broken because I have only been testing 4bpw locally.

@turboderp Do you have any idea what's going on here? The quantization commands I used are:

python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -nr
python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -l 12288 -r 26 -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/v8-exl2-4bpw-fiction -nr

The measurements file is here: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8/blob/main/v8meas.json

Anecdotally I noticed exllama seemed to be allocating 2.2bpw to most everything except a few layers in the middle.

Glad my little tests could help. I wish Ooba included Hellaswag (I guess it's not just about including the text file lol), as LlamaCPP does.

For your defective quants, check LoneStriker's, his 3bpw Exl2 quant works as intended.

And otherwise, great jobs on your merges!

@brucethemoose

Oof. Yeah something is wrong with the lower quants, possibly all of them.

You should try to use a calibration dataset composed of random tokens, I'm not joking it's probably the best solution to this
https://github.com/ggerganov/llama.cpp/discussions/5006

Yeah I am in that thread already, the one who tested with exl2.

The jury is still out, but its very interesting. I will do more testing later.

Sign up or log in to comment