GGUF adds `<0x0A>` during tokenization due to missing `tokenizer.model`

by alvarobartt - opened Dec 4, 2023

Dec 4, 2023

Hi here @TheBloke , first of all thanks a lot again for investing the time, effort and compute on quantizing our notus-7b-v1 models 🫶🏻

We just wanted to report that it seems that the GGUF variants via llama.cpp are not properly encoding the \n probably due to the missing tokenizer.model file, so we've ported it from Zephyr (as we share the same tokenizer) and it's already available at https://huggingface.co/argilla/notus-7b-v1/blob/main/tokenizer.model, in case you'd like to re-run the GGUF quantization.

We've tried to quantize it using GGUF and Q4 K M, and it worked fine, see https://huggingface.co/alvarobartt/notus-7b-v1-GGUF. If you are not able to re-run we are happy to do so on our compute and then share the different GGUF files with you, thanks in advance 🤗

eramax

Dec 4, 2023

I was going to report his issue.
Thanks @alvarobartt for reporting.

TheBloke

Owner Dec 4, 2023

OK I will re-do it now

alvarobartt

Dec 4, 2023

Thanks and sorry for the inconveniences! I could quantize some at https://huggingface.co/alvarobartt/notus-7b-v1-GGUF, in case you want to reuse some

TheBloke

Owner Dec 4, 2023

•

edited Dec 4, 2023

GGUFs have been re-made from the updated source repo and now appear fine:

 <|system|>
You are a story writing assistant
<|user|>
Write a story about llamas
<|assistant|>
Once upon a time in the high Andes Mountains of South America, there lived a herd of llamas. These majestic creatures were known for their strong and sturdy bodies, long necks, and fluffy brown coats. They spent their days grazing on the lush grassy fields and roaming around the rocky terrain.

Amongst this herd was a young female llama named Luna. She had just turned one year old and was eager to explore the world beyond her home. With the guidance of her mother, Luna learned how to navigate through the mountain paths, recognize different plants for food and water sources.

As Luna grew older, she became more independent and started venturing out further each day. One afternoon, she came across a group of llamas from a nearby village who were traveling back home after trading goods. The leader of the group was an experienced llama named Tariq, who warmly welcomed Luna into their travel party.

Together, they traversed through rugged mountain passes and crossed rushing rivers. Luna learned valuable lessons about trust, respect, and teamwork from her new friends. She also discovered that llamas were not just for transportation, but were skilled in carrying heavy loads and weaving woolen fabrics.

As the days turned into weeks, Luna became more attached to Tariq's herd. However, she knew deep down that it was time for her to return home to her mother and siblings. With a heavy heart, Luna said goodbye to her friends and set off on her journey back home.

On her way back, Luna encountered some unexpected challenges, such as strong winds, treacherous cliffs, and wild animals. But with the determination she had learned from Tariq's herd, Luna overcame these obstacles and found her way back home safely.

From that day on, Luna never forgot the valuable lessons she had learned about friendship, trust, and adventure. And as she grew older and became a mother herself, she passed down these stories and traditions to her own offspring, ensuring that the legacy of llamas would continue for generations to come. [end of text]

Apologies for not spotting this and thanks for updating your repo.

I've raised an issue with llama.cpp regarding this <0x0A> issue. It's an issue caused by the llama.cpp PR for making GGUFs from tokenizer.json, when no tokenizer.model is provided.

alvarobartt

Dec 4, 2023

No worries at all @TheBloke it was on us that didn't realise! Thanks a ton for fixing it straight away, it's also not super easy to get the tokenizer.model unless it's available as part of the base model, because then the default is the fast, Rust-based, tokenizer and there's not a snippet to go from fast to slow, while the other can easily be done. I think it has some issues attached to it, but maybe worth investigating on llama.cpp to tra to use the Rust-based one instead from tokenizer_config.json, anyway, thanks a ton 🎉

alvarobartt changed discussion status to closed Dec 4, 2023

Phil337

Dec 4, 2023

This comment has been hidden

Phil337

Dec 4, 2023

Thanks for figuring this out @alvarobartt and for the quick fix @TheBloke . Just tested the updated notus GGUF and it works great.

Just in case you aren't aware this issue is also impacting.

https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-7B-v3-2-7B-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment