Add missing quant_config.json for compatibility with vLLM backends out of the box.

#1
No description provided.
SolidRusT Networks org

Thank-you.

Suparious changed pull request status to merged

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

SolidRusT Networks org
edited Mar 24

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

I just tested it at full bfloat16 and it doesn't seem to respond well, also it has a tiny context window (8192) compared to other Mistral fine tunes.

Today I compared Nous Hermes 2 Pro 7B with Gorilla LLM 7B, Raven v2 13B and Starling 7B.

did you try the Alpha version: TheBloke/Starling-LM-7B-alpha-AWQ

I can make a quant of the beta now if you like.

it is simple, as I just use the example script from the CasperHansen AutoAWQ repo.

https://github.com/SolidRusT/srt-model-quantizing.git

SolidRusT Networks org
edited Mar 24

OK, the 'Nexusflow/Starling-LM-7B-beta' model is in the AWQ quant queue now.

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

I just tested it at full bfloat16 and it doesn't seem to respond well, also it has a tiny context window (8192) compared to other Mistral fine tunes.

"Nous Hermes 2 - Mistral 7B - DPO" is fine-tune originaly from Mistral-7B-v0.1 which has 8k token context. Only the newer Mistral-7B-v0.2 has 32k context.

I tried the EagleX on CPU today. Incredibly slow.

SolidRusT Networks org

Just because the original Mistral model was limited to 16k context with a 4k sliding window, does not make fine-tune variants have the same limitations. This Nous Hermes 2 Pro handles up to 32k context.

I have only been able to use it with 16k context, due to a VRAM limitation. Maybe check some examples of LLlama with 128k context, to learn more about how these authors are widening the default context window.

This Starling quant is on it's way. uploading the AWQ now: https://huggingface.co/solidrust/Starling-LM-7B-beta-AWQ

Hermes-2-Pro-Mistral-7B is interesting, but I supect that for chat without functions DPO version will be better.

You were right, the Starling-LM-7B-beta-AWQ is not that good. It is very chatgpt like sounding and does not follow instructions. I am testing the Hermes-2-Pro-Mistral-7B.

Sign up or log in to comment