Update chat template

by CISCai - opened Apr 18, 2024

Apr 18, 2024

I know it's a bit of a pain, but could you update the chat template to the latest chat templates now that llama.cpp supports it?

At least you won't have to requantize everything as I made a handy script that lets you create a new GGUF using the updated tokenizer_config.json file, see the details in the PR. :)

qwp4w3hyb

Owner Apr 18, 2024

This comment has been hidden

qwp4w3hyb

Owner Apr 18, 2024

•

edited Apr 18, 2024

Can do, although I'm not sure if I'm doing the right thing, please double check:

I uploaded a test file at ./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf

I ran:

python3 scripts/gguf-new-metadata.py --chat-template default --chat-template-config ../c4ai-command-r-v01/tokenizer_config.json ./c4ai-command-r-v01-imat-IQ1_S.gguf c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf

But gguf-dump.py only shows the added chat-template=default key, I was expecting new entries below that for the different tempalates, is this expected or am I doing sth wrong ?

python3 scripts/gguf-dump.py --no-tensors --json ./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf

{"filename": "./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf", "endian": "LITTLE", "metadata": {"GGUF.version": {"index": 0, "type": "UINT32", "offset": 4, "value": 3}, "GGUF.tensor_count": {"index": 1, "type": "UINT64", "offset": 8, "value": 322}, "GGUF.kv_count": {"index": 2, "type": "UINT64", "offset": 16, "value": 24}, "general.architecture": {"index": 3, "type": "STRING", "offset": 24, "value": "command-r"}, "general.name": {"index": 4, "type": "STRING", "offset": 73, "value": "c4ai-command-r-v01"}, "command-r.block_count": {"index": 5, "type": "UINT32", "offset": 123, "value": 40}, "command-r.context_length": {"index": 6, "type": "UINT32", "offset": 160, "value": 131072}, "command-r.embedding_length": {"index": 7, "type": "UINT32", "offset": 200, "value": 8192}, "command-r.feed_forward_length": {"index": 8, "type": "UINT32", "offset": 242, "value": 22528}, "command-r.attention.head_count": {"index": 9, "type": "UINT32", "offset": 287, "value": 64}, "command-r.attention.head_count_kv": {"index": 10, "type": "UINT32", "offset": 333, "value": 64}, "command-r.rope.freq_base": {"index": 11, "type": "FLOAT32", "offset": 382, "value": 8000000.0}, "command-r.attention.layer_norm_epsilon": {"index": 12, "type": "FLOAT32", "offset": 422, "value": 9.999999747378752e-06}, "general.file_type": {"index": 13, "type": "UINT32", "offset": 476, "value": 24}, "command-r.logit_scale": {"index": 14, "type": "FLOAT32", "offset": 509, "value": 0.0625}, "command-r.rope.scaling.type": {"index": 15, "type": "STRING", "offset": 546, "value": "none"}, "tokenizer.ggml.model": {"index": 16, "type": "STRING", "offset": 597, "value": "gpt2"}, "tokenizer.ggml.tokens": {"index": 17, "type": "ARRAY", "offset": 641, "array_types": ["STRING"]}, "tokenizer.ggml.token_type": {"index": 18, "type": "ARRAY", "offset": 4813469, "array_types": ["INT32"]}, "tokenizer.ggml.merges": {"index": 19, "type": "ARRAY", "offset": 5837518, "array_types": ["STRING"]}, "tokenizer.ggml.bos_token_id": {"index": 20, "type": "UINT32", "offset": 10865265, "value": 5}, "tokenizer.ggml.eos_token_id": {"index": 21, "type": "UINT32", "offset": 10865308, "value": 255001}, "tokenizer.ggml.padding_token_id": {"index": 22, "type": "UINT32", "offset": 10865351, "value": 0}, "tokenizer.ggml.add_bos_token": {"index": 23, "type": "BOOL", "offset": 10865398, "value": true}, "tokenizer.ggml.add_eos_token": {"index": 24, "type": "BOOL", "offset": 10865439, "value": false}, "general.quantization_version": {"index": 25, "type": "UINT32", "offset": 10865480, "value": 2}, "tokenizer.chat_template": {"index": 26, "type": "STRING", "offset": 10865524, "value": "default"}}, "tensors": {}}⏎

If this is the expected result I can run it on all files that I still have downloaded. (My model folder is 15 TB so I had to delete some older ones and would only update the ones I'm actively using)

Also do you know if the default template exists for all models or do I have to check the tokenizer_config.json for each model beforehand ?

CISCai

Apr 19, 2024

It would appear that you do not have the latest tokenizer_config.json, also you only have to provide the --chat-template-config option, not --chat-template (it's only if you don't have the JSON file).

With the correct file you should see 3 new (in addition to tokenizer.chat_template) metadata items:

tokenizer.chat_templates
tokenizer.chat_template.tool_use
tokenizer.chat_template.rag

qwp4w3hyb

Owner May 11, 2024

Hey, when requantizing for the new pre-tokenizer stuff I finally got around to looking at this again, I indeed had an outdated tokenizer_config.json.
Sadly I only noticed after quantizing again, but a script is now running which runs the gguf-new-metadata.py on all ggufs and they should appear with fixed metadata one-by-one in the next few hours.

C4AI Command-R Plus will also be requantized and uploaded with fixed metadata, but that will take some time as just imatrix generation takes ~ 60 hours on my poor CPU-only server.

qwp4w3hyb

Owner May 11, 2024

•

edited May 11, 2024

IQ1_S is already fixed, others should appear over a few hours.

qwp4w3hyb changed discussion status to closed May 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment