Like any other L2 model, produces illogical gibberish when asked for a story

#1
by boqsc - opened

LLAMA2 models are horrible at story logic. Nous Hermes1 with LLAMA1 was best for story generation and Nous GPT4xVicuna was the most satifying to interact with.

In the latest prompt answer of MythoLogic, I even had a story generated where it starts with father being dead and a few sentences in suddenly father is speaking and interacting.
This never happened with LLAMA1 models.

One thing to look out for is if you are using llama.cpp to run these models, there are certain default hyperparameter values that are incorrect for L2 models, specifically -eps which should be 1e-5, which will damage generation quality if not overridden in command line. It does appear to make a perceptible difference.

I think their next generation model format, GGUF, should handle it by default.

One thing to look out for is if you are using llama.cpp to run these models, there are certain default hyperparameter values that are incorrect for L2 models, specifically -eps which should be 1e-5, which will damage generation quality if not overridden in command line. It does appear to make a perceptible difference.

I think their next generation model format, GGUF, should handle it by default.

Oh that's interesting, I didn't know that. So I should add -eps 1e-5 to my llama.cpp example for all L2 models? Anything else I should change at the same time?

One thing to look out for is if you are using llama.cpp to run these models, there are certain default hyperparameter values that are incorrect for L2 models, specifically -eps which should be 1e-5, which will damage generation quality if not overridden in command line. It does appear to make a perceptible difference.

I think their next generation model format, GGUF, should handle it by default.

Oh that's interesting, I didn't know that. So I should add -eps 1e-5 to my llama.cpp example for all L2 models? Anything else I should change at the same time?

More details could be found in issue #2373 and the corresponding push request in llama.cpp repo. I don't have the technical background to understand why this has the effect it had, nor what finetuning would do with this, if anything. It might be interesting to test, though.

EDIT: llama.cpp PR #2384 should have made that unnecessary - Didn't see that earlier. No idea what went wrong with those finetuned llama2 models then. FWIW StableBeluga-13B for instance certainly doesn't seem to me to do any worse than Vicuna 1.3 13B or any other finetuned L1 models I've tried, in terms of storytelling quality and chances of catastrophic error, but then I'm not trying anything complicated with them.

OK thanks for the details

Yes, I also noticed this and started to double-check all the parameters with the model config. Like this https://huggingface.co/Gryphe/MythoLogic-L2-13b/blob/main/config.json
It also very useful for 8k-16k models to check their original config.json, because they have settings that you should manually add to llama.cpp (like rope_scaling). I hope in GGUF it will be automatically.

Can you elaborate on that? What settings are you adding?

I can look to put those settings in the readme automatically if i know what I should be looking for in config json. Thanks

Can you elaborate on that? What settings are you adding?

I can look to put those settings in the readme automatically if i know what I should be looking for in config json. Thanks

At this moment on this:

rms_norm_eps
rope_scaling

rope_scaling if null = that means compress_pos_emb = 1 aka --rope-freq-scale 1.0
if "type": "linear"

"rope_scaling": {
    "factor": 4.0,
    "type": "linear"
  },

that means llama.cpp compress_pos_emb = 4 aka --rope-freq-scale 0.25 (1.0\4 = 0.25)

I see that in all readme you add this :

Change -c 2048 to the desired sequence length for this model. For example, -c 4096 for a Llama 2 model. For models that use RoPE, add --rope-freq-base 10000 --rope-freq-scale 0.5 for doubled context, or --rope-freq-base 10000 --rope-freq-scale 0.25 for 4x context.

That confuse, because llama 2 by default has context 4k and you shouldn't touch --rope-freq-scale, or if you set --rope-freq-scale 0.5 that means you double 4k to 8k.

UPD: compress_pos_emb that is for text-generation-webui

Sign up or log in to comment