Couple questions + feedback

by SerialKicked - opened Oct 25

Oct 25

•

Hey! For some reason this model appears 2 times in the open leaderboard with wildly different evals. Is there a reason why? Different versions? I'm mostly wondering because of the massive difference in IFEval between the two entries.

Beside that, it's the first good Qwen 2.5 32B fine-tune I've found. Haven't tested it for long, but so far (in Q4KM / 16K context):

Writing style is better than a Mistral Small (not exactly a big ask, but still)
Reasoning seems very decent, bit early for me to tell.
The instruction refusals I got were all in Chinese, even in a discussion that was 16Ktks worth of English.
Those refusals are rare. They are also very light, disappearing with a very basic system prompt
Does ok at instruction following and self prompting tasks (summarize, RAG, website navigation, API handling), but is still weaker than a Mistral Small with the correct instruct formatting.
In RP, it has a tendency to talk for both user and itself. Light, relatively easy to steer away, but noticeable. Doesn't happen when taking a more "assistant" role.

All things considered, pretty good.

fblgit

Owner Oct 25

The only difference is the use chat template flag on the evaluations submission. Its the same model. I also find very interesting the IFeval results, in our experiments the IFeval gets impacted easily. It has to be something not disclosed entirely on how the foundational model has been trained.

SerialKicked

Oct 25

•

edited Oct 25

I assume it means that one was tested with the ChatML instruct format and the other one without (raw inputs)? If you got worse IFEval results with ChatML enabled, this would be a very funky result, indeed.

In any case, there's likely something weird with the base model, at least it seems to be the opinion of several people I talked to: it seems difficult to get decent/consistent results from tuning it.

Thanks for the clarification.

fblgit

Owner Oct 25

I just fixed the chat template my friends, pull it again.. u'll see :)

SerialKicked

Oct 25

I'm using a GGUF and my own private front-end (think silly tavern but a lot more powerful), so I don't use the cfg's template automatically (yet).
From what I can read, it's standard ChatML with additional function calling tags. I didn't know it had function calling, I do need that feature so that's nice.

You're sure about replacing the <|im_end|> "eos_token" by <|endoftext|>? That seems wrong.

fblgit

Owner Oct 25

•

edited Oct 25

i ran the evals, it gets +1.85 on MATH and +26 on IFeval .. quantz stills not yet updated.

SerialKicked

Oct 25

•

edited Oct 25

Welp, you know best :D I'll check the fixed GGUF whenever you have them ready.

fblgit

Owner Oct 26

uploaded the new GGUF's you can find them here:
https://huggingface.co/fblgit/TheBeagle-v2beta-MGS-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment