assitance error

#5
by gopi87 - opened

hi its genernating assitanse error in web ui

Please make sure you select the Chat Template ChatML which is the correct template for this model.
https://huggingface.co/MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4

hi thanks for the reply this is what i optained from model meta data and still its making same error

Model metadata: {'general.name': 'models--MaziyarPanahi--Llama-3-70B-Instruct-DPO-v0.4', 'general.architecture': 'llama', 'llama.block_count': '80', 'llama.context_length': '8192', 'split.tensors.count': '723', 'tokenizer.ggml.eos_token_id': '128256', 'general.file_type': '10', 'llama.attention.head_count_kv': '8', 'llama.embedding_length': '8192', 'llama.feed_forward_length': '28672', 'split.count': '6', 'llama.attention.head_count': '64', 'llama.rope.freq_base': '500000.000000', 'split.no': '0', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.vocab_size': '128257', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.model': 'gpt2', 'tokenizer.ggml.pre': 'llama-bpe', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '128000', 'tokenizer.ggml.padding_token_id': '128001', 'tokenizer.chat_template': "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"}
Using gguf chat template: {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
Using chat eos_token: <|im_end|>
Using chat bos_token: <|begin_of_text|>

@MaziyarPanahi do you have any solution ? i cant stope the assitance error and unfortunatly in cant run the whole dpo 0.4

Hi @gopi87
Which Quant are you using? I'll test it locally and get back to you

@MaziyarPanahi thanks for the reply i am using the q2k for testing purpose but plan to user q4

thanks @gopi87
Could you also share where and how you are using it? (my testing setup is pure Llama.cpp and LM Studio)

@MaziyarPanahi i am using it in wep ui and koldbold cpp bothe giving same error

@gopi87 there are some libraries that use <|eot_id|> to stop, but I think I found a fix. I will redo this one and this time I'll use imatrix since this is a very high-quality model. This way we get a great quantized models out of it.

I'll ping you here once the new ones are re-uploaded so you can test them

@MaziyarPanahi thanks for the reply
i was very intrested in imatrix infact i have only one model which i am using it every day in my 3080 16gm vrm and 32 gp laptop doing greate getting 1.5 token/sec on IQ4_XS.gguf

here is a link

https://huggingface.co/mradermacher/Meta-Llama-3-70B-Instruct-DPO-i1-GGUF/tree/main

maybe this will help you more

@gopi87 you are welcome

I am done with all the conversion, the new quants are being uploaded as we speak.

PS: the imatrix must be calculated based on each model. So I had to first convert the original model to a 16bit GGUF model, then create an imatrix.data which takes about 2 hours, then based on that I quantized the model to 2bit, 3bit, etc.

In about 15-20 minutes you should be able to see the PR that is uploading the new models, higher quality and already has a fix for stopping where it should like the original model.

@MaziyarPanahi thanks sir i will check and report about it

gopi87 changed discussion status to closed
gopi87 changed discussion status to open

Sign up or log in to comment