General discussion.

by Lewdiculous - opened Feb 29

Discussion

Lewdiculous

Feb 29

WARNING: GGUF versions might be broken.

Welp, are they really? 👀

DreamGenX

DreamGen org Feb 29

I am not sure. The original versions were badly broken in many tools (I tried in llama-cpp-python and ooba) where even the <|im_start|> and <|im_end|> were not tokenized correctly.
I thought it might have been because tokenizer.json was missing from the original, so I added it and re-ran the process again. I still have to do more testing to be sure.

Would love to hear from someone who tried both the fp16 and gguf if it seems broken :D

Lewdiculous

Feb 29

I can't really run the full F16 but I'll try with Imatrix GGUF quants and see if anything seems badly broken.

jirka642

Mar 1

•

edited Mar 1

I think it's still broken.

The output of the full model looks okay, but I am getting this type of gibberish with Q8_0 GGUF:
as she looked down at her with her arms still behind her head as she spoke again with her smile still present as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head

Lewdiculous

Mar 1

•

edited Mar 1

@jirka642 Can you confirm this is also the case with my Quants?

I didn't seem to experience that somehow.

https://huggingface.co/Lewdiculous/opus-v1.2-7b-GGUF-IQ-Imatrix

jirka642

Mar 2

@Lewdiculous

Sorry, ignore that. After testing your quants, the quants from this repo suddenly started working too...
I didn't change any parameters or the prompt, so I don't know what was the issue.

Lewdiculous

Mar 2

I think for this model it's very important that you're using the correct prompt format preset, it seems very sensitive.

DreamGenX

DreamGen org Mar 2

Hi @jirka642 -- that might be also down to sampling params, I use the following:

temperature: 0.8 (or less)
min_p: 0.05 (or a bit more)
frequency_pentaly, presence_penalty: 0.1
repetition_pentaly: 1.1

And as @Lewdiculous said, the model might be sensitive to the prompt template -- at least the correct ChatML+text (where assistant role is replace with text role -- more in the docs) and the first lines of the system message.

Some software might tokenize wrong, but this is more the case for the Yi base 34B model (since Yi has nonstandard tokenizer settings).

For the 34B, the correct tokenization for "<|im_start|>system\nHello!" would be:

# Common software bugs here are that there's BOS at the start (token in 1) and that the "system" gets tokenized as `▁system` with token id 1328.
# Yi models should not have BOS and not not have the `▁` in this case.
['<|im_start|>', 'system', '\n', 'Hello', '!']
[6, 10707, 144, 25102, 99]

For the 7B, the correct tokenization for "<|im_start|>system\nHello!" would be:

['<|im_start|>', '▁system', '<0x0A>', 'Hello', '!']
[32000, 1587, 13, 16230, 28808]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment