This model uses <0x0A> instead of a carriage return

by YearZero - opened Dec 3, 2023

Dec 3, 2023

I think this was happening to another model in the last few days, I forgot which. Instead of CRLF it just outputs <0x0A>

Phil337

Dec 4, 2023

@YearZero are you using GPT4All? My guess is the Bloke's GGUF files are good. I'm getting the same from all recent models. I checked GPT4All's Github page and there's a token issue with GGUF that is being fixed in the next update. It likely includes a fix for this.

darxkies

Dec 4, 2023

The latest llama.cpp does output <0x0A> instead of a regular LF.

YearZero

Dec 4, 2023

•

edited Dec 4, 2023

It looks like it could be a problem with the tokenizer like this model:
https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/discussions/1#6566495193951c950b3b8c10

Phil337

Dec 4, 2023

Yeah, it's also impacting Notus and is summed up succinctly by @alvarobartt

https://huggingface.co/argilla/notus-7b-v1/discussions/3#656dc9d802a56b531ade7f73

Nightcall

Dec 4, 2023

Yes, I get this token always <0x0A> and sometimes a series of tokens like this: <0x0A><0x0A><0xF0><0x9F><0x9A><0xB8>

zhibor

Dec 6, 2023

any solution to this <0x0A> issue?

Phil337

Dec 6, 2023

@zhiboz Yes, the Bloke just fixed this for Notus. Someone needs to bring this to @TheBloke attention.

alvarobartt

Dec 6, 2023

Indeed it was due to the addition of the tokenizer.json file, since GGUF is expecting the SentencePiece (slow tokenizer) rather than the fast one (Rust-based) which is transformers default one, that's why then when running python convert.py ... from llama.cpp to convert the weights into GGUF, the model is not available, so the tokenization needs to be inferred from the vocab file and the format (e.g. BPE for the Mistral-based models). Anyway, it seems that the tokenizer.model has been uploaded 9 days ago into berkeley-nest/Starling-LM-7B-alpha (see https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/blob/main/tokenizer.model), so re-running the GGUF conversion script would do the work 🤗

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment