Model performance review

#2
by HR1777 - opened

I have downloaded and tested this model, and it performs well overall. However, I have observed that the model may not be as reliable for processing large inputs. Another small issue i have noticed is generating several <|im_end|> at the end of response. Additionally, I have provided the model with the same questions used for evaluating the Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow model, and here are the results:

Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.
OUTPUT:
Goku was walking through his hometown when he suddenly heard an explosion in the distance. He quickly flew towards it and saw Kirby standing in front of him along with an unfamiliar figure - it was none other than Majin Buu! Kirby had somehow managed to convince him that they should join forces in order to take over Earth! Goku was shocked but quickly sprang into action - he knew he had no choice but to stop them before it was too late! He quickly engaged in battle against Kirby and his new ally but soon realized that this was no ordinary fight - Kirby had somehow gained access to powerful new abilities that made him even more dangerous than before! Despite this newfound power however, Goku was determined not to give in - he fought on until he was finally able to defeat Kirby and his new ally! The two villains were defeated but not before they had caused some serious damage - it was now time for Goku to start rebuilding his hometown!

If I lay 2 wet shirts out in the sun flat on the ground to dry and it takes 5 hours until they are dry, how long would it take to dry 4 shirts of the same type that way simultanously?
OUTPUT:
It will take twice as much time as drying two shirts since you're drying four times as many shirts simultaneously - so four times as much time as drying one shirt - so four times as much time as five hours - so twenty hours total for four shirts.

explain step by step 25-4*2+3=?
OUTPUT :
Step 1: Calculate the expression inside parentheses first - (4 * 2) = 8. The equation now becomes 25 - 8 + 3 = ?. Step 2: Subtract -8 from 25 - (25 - 8) = 17. The equation now becomes 17 + 3 = ? Step 3: Add +3 - (17 + 3) = 20. The equation now becomes ? = 20. Therefore, the answer is 20.

HR1777 changed discussion title from Model performance to Model performance review

Thanks @HR1777 for reviewing this, appreciate it.

Would you mind sharing which quantized GGUF model you used to evaluate this model?

I've used the Q4_K_M model, which I acknowledge doesn't perform the best, but due to hardware limitations, it was my chosen option. For Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow, I plan to use the Q5_K_M variant. I'll download the model today and share the results tomorrow.
By the way, could you please share one full example of prompt that should be used for these models? please also include s as you mentioned as well.

I've used the Q4_K_M model, which I acknowledge doesn't perform the best, but due to hardware limitations, it was my chosen option. For Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow, I plan to use the Q5_K_M variant. I'll download the model today and share the results tomorrow.
By the way, could you please share one full example of prompt that should be used for these models? please also include s as you mentioned as well.

I have added examples to the original model's cards:

I will add this to the GGUF model's card as well

I like https://huggingface.co/MaziyarPanahi/Nous-Hermes-2-Mixtral-8x7B-SFT-Wikihow model, it does good. It is still sensitive to chatml template and <s> must be added.

I have seen those examples already, but i asked about <s>. Where should i add it?

oh sorry about that, it's actually a HuggingFace thing in their pipeline: add_special_tokens=True,

I've seen this discussion (not the same model, but similar) so I added that parameter into my HF pipeline and it really helped with repetition at the end of text generation: https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/discussions/5

OK, that's fine :) .thank you sod much again for your great efforts.

HR1777 changed discussion status to closed

Sign up or log in to comment