awq quants

#2
by prudant - opened

Hi! I have tryed your 2x7b awq version ansd its really cool, can you quant this version, Im sure its a great model :)

Sure thing! I’ll post it tomorrow :)

thanks!!!

@prudant here you go! I am testing them now so im sorry if theres any issues https://huggingface.co/macadeliccc/laser-dolphin-mixtral-4x7b-dpo-AWQ

thanks, will try!!!!
I have never quantized an awq, I have done a couple of gptq so I don't know the process and what steps it involves, if you can give me a little idea so I don't bother you with versions in awq anymore hahaha

best regards, and thanks again!

No worries. Everything you need is right here https://github.com/casper-hansen/AutoAWQ

thanls! next time will try by my self !

heres the notebook I use for a lot of quants if my main GPU is in use: https://colab.research.google.com/drive/1o8PD8iw2eGkmdrUgPAbU56TliR7OA0_1?usp=sharing

thanks!!! =)

Hi again!, I'm trying the model and works fast and accurate, but i'm seeing a weird behaviour in the start of each generated sentences, the first token/word is "\n" or ":" it did not happened with the 2x7b version, its really weird, dont know if its my setup or the model, do you see something like that in your tests? Tested prompt template is Chat ML validated with a manual test on jinja render to see if the chattemplate was the problem but not. Serving the model with aphrodite engine, also 2x7b.

I have noticed that behavior before and in my use case it was not consistent. If youre using it for rag I would say check your prompt. If youre just trying to chat I would say that its unexpected behavior from quantizing a clown MoE. The GGUF version of the 4x7b does not work correctly.

It could be leftover from original merge, but the actual 4x7b works fine in fp16

its strange same app with rag with same prompts but with 2x7b works with nothing weird on the generation, sometimes it push : at the init of the generates messaged, for now a .replace at the end of the generation process will do the magic jejejeje

Im glad that its not hurting the performance too much. Which one do you prefer? 2x7b or 4x7b?

cant tested in deep yet, would try this weekend to test the producto with the 4x7b, i'm facing infrastructure problems (high prior by now), but in a few days will tryit in deep and I will back to leave you some feedbacks !

after reading our past conversations (I feel like a dejavu) in this post https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo/discussions/2 we are ready have this issue. its the same behaviuour of inserting tokens inside the generated text, but in this new case in addition the generation starts somethings with \n or :

What was the problem on that occasion?

I do not know it may be a corrupted instance of AWQ on my local machine. I mostly use it locally but if theres an error I will redo it in a colab instance so its more isolated. Seems there very well may be an issue somewhere on my local machine.

I tried the model a little bit and for rag the 4x7 its a little more obedient in the instructions that guide the response tone, for example the 4x7b follows the instruction of type "do not say things like: according to the context information, or according to the text" however the 2x7b the first thing it does is saying "according to background information blah blah blah" even if you were ordered not to, so it appear to have better reasoning capabilities.

also its better for role playing that 2x7b, in my case i ask to the model to answer with empaty and psicology (because its a rag for cancer people), and 4x7 has a more human tone in that way than 2x7b

An answer from de 4x7:

I understand that facing a cancer diagnosis can cause fear and uncertainty. It is important to remember that each person and their situation is unique, and the course of cancer varies depending on several factors. Here, we believe in the importance of maintaining a positive attitude and nurturing hope. It is essential to communicate openly with your medical team, understand your treatment options, and make conscious decisions. Remember that the support of your loved ones and health professionals can be crucial during this process. Regarding the probability of dying from cancer, only oncology doctors, after analyzing all the data of your case, could give you an accurate estimate. We focus our attention on improving the quality of life of our patients, encompassing their physical, emotional and spiritual well-being.

Remember that we are here to support you on this path and that together we will work on your recovery and well-being.

I tried the model a little bit and for rag the 4x7 its a little more obedient in the instructions that guide the response tone, for example the 4x7b follows the instruction of type "do not say things like: according to the context information, or according to the text" however the 2x7b the first thing it does is saying "according to background information blah blah blah" even if you were ordered not to, so it appear to have better reasoning capabilities.

I have definitely noticed this behavior. I ended up switching off of the 2x7b for rag over this. Honestly it seems like the 4x7b answers much better.

A little hint for the 2x7b - putting "respond like a human" at the end of your prompt dramatically increases the quality of the response in many use cases. Even if the rest of the prompt is complex

Sign up or log in to comment