solidrust
/

Nous-Hermes-2-Mistral-7B-DPO-AWQ

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

I just tested it at full bfloat16 and it doesn't seem to respond well, also it has a tiny context window (8192) compared to other Mistral fine tunes.

"Nous Hermes 2 - Mistral 7B - DPO" is fine-tune originaly from Mistral-7B-v0.1 which has 8k token context. Only the newer Mistral-7B-v0.2 has 32k context.

vaclavkosar

Mar 24

I tried the EagleX on CPU today. Incredibly slow.

Suparious

SolidRusT Networks org Mar 24

Just because the original Mistral model was limited to 16k context with a 4k sliding window, does not make fine-tune variants have the same limitations. This Nous Hermes 2 Pro handles up to 32k context.

I have only been able to use it with 16k context, due to a VRAM limitation. Maybe check some examples of LLlama with 128k context, to learn more about how these authors are widening the default context window.

This Starling quant is on it's way. uploading the AWQ now: https://huggingface.co/solidrust/Starling-LM-7B-beta-AWQ

vaclavkosar

Mar 24

Hermes-2-Pro-Mistral-7B is interesting, but I supect that for chat without functions DPO version will be better.

vaclavkosar

Mar 25

You were right, the Starling-LM-7B-beta-AWQ is not that good. It is very chatgpt like sounding and does not follow instructions. I am testing the Hermes-2-Pro-Mistral-7B.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment