Higher PPL than Mixtral?

#11

by Thireus - opened Dec 19, 2023

Discussion

Thireus

Dec 19, 2023

I ran a PPL eval and noticed that the PPL is much higher than the original Mixtral model on wikitext.

LoneStriker_dolphin-2.5-mixtral-8x7b-6.0bpw-h6-exl2-2 - 4.464363098144531
turboderp_Mixtral-8x7B-instruct-exl2_8.0bpw - 3.7087724208831774

I was wondering if this is expected.

For ref, 70b dolphin models give me PPLs just below 4:

LoneStriker_dolphin-2.2-70b-6.0bpw-h6-exl2-2 - 3.965563297271729

HiroseKoichi

Dec 20, 2023

You're comparing 6.0bpw to 8.0bpw, so yes, it's expected that they might have higher perplexity in general, but also with exl2, the quality can vary depending on the calibration dataset used during quantization.

Thireus

Dec 20, 2023

@HiroseKoichi , it's a 0.76 PPL jump.

If someone can share the PPL on the non-quantized version I'd be interested to see how far it is from Mixtral original model.

HiroseKoichi

Dec 20, 2023

•

edited Dec 21, 2023

Taking a look at Turboderp's page, it looks like your test is the outlier here and dolphin is right in line with the expected numbers: https://huggingface.co/turboderp/Mixtral-8x7B-instruct-exl2

I'm by no means an expert on exl2 quantization, but wikitext is a popular calibration dataset for exl2 quants, which could explain why the perplexity is much lower for Mixtral-Instruct. Try running it through a different dataset.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment