Is this based on the "Update (5/3)" version?

#1
by Propheticus - opened

Gradient uploaded a new version 18 hours ago claiming "Update (5/3): We further fine-tuned our model to strengthen its assistant-like chat ability as well. The NIAH result is updated." (also for their 1048k model btw)
Your quants were uploaded 5 hours ago and maybe you used the latest source, but it's so close to their update it could very well have been the previous version.

Yeah this is why I dislike model updates in place lmao, yes this is using the version from 18 hours ago

You were fast then! Which is good, but also...
I kind of secretly hoped it wasn't the latest and there was a chance the new version would be better in chat quality. This version makes formatting errors and is far less detailed/nuanced in its answers compared to the 8k base model. Then, like you already know, there's the odd</s>token.

I've made them aware of the issue but the rest I'm not sure what is needed

https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/discussions/20#66372ca74b43ab85e5ba5dbb

Ah well, at least we know it's not in the way you/llama.cpp does the quantisation.
It is a bit strange these issues like non capitalisation and the odd additional stop token were not caught by Gradient in their testing of the model.

I suppose most of their testing was automated and targetted at context retrieval rather than performance of output

Sign up or log in to comment