Low perplexity!

#1
by brucethemoose - opened

I ran perplexity tests on a bunch of 3bpw quantizations on a novel-style dataset at context sizes of 20,000, 10,000, and 2,000, respectively. So the top most quantization in each row is 20K context:

v7-exl2-31bpw-fiction
-- Evaluation perplexity: 6.5185
-- Evaluation perplexity: 6.7835
-- Evaluation perplexity: 38.2186
LoneStriker_Thespis-34b-DPO-v0.7-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.6764
-- Evaluation perplexity: 6.9829
-- Evaluation perplexity: 8.3438
LoneStriker_Tess-34B-v1.5b-3.0bpw-h6-exl2
-- Evaluation perplexity: 8.4026
-- Evaluation perplexity: 8.7730
-- Evaluation perplexity: 11.4197
LoneStriker_deepmoney-34b-200k-base-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.4025
-- Evaluation perplexity: 6.6599
-- Evaluation perplexity: 7.7988
LoneStriker_bagel-34b-v0.2-3.0bpw-h6-exl2
-- Evaluation perplexity: 49.0813
-- Evaluation perplexity: 45.8001
-- Evaluation perplexity: 18.7098
smaug-3.0bpw
-- Evaluation perplexity: 32.4581
-- Evaluation perplexity: 27.7181
-- Evaluation perplexity: 11.0142
LoneStriker_Yi-34B-200K-DARE-megamerge-v8-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.7946
-- Evaluation perplexity: 7.0572
-- Evaluation perplexity: 8.1391
LoneStriker_Tess-M-v1.0-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.1975
-- Evaluation perplexity: 6.4421
-- Evaluation perplexity: 7.5398
LoneStriker_Pallas-0.5-3.0bpw-h6-exl2
-- Evaluation perplexity: 8.9354
-- Evaluation perplexity: 9.4341
-- Evaluation perplexity: 17.5678
LoneStriker_Yi-34B-200K-AEZAKMI-v2-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.8550
-- Evaluation perplexity: 7.1553
-- Evaluation perplexity: 8.4838
LoneStriker_Yi-34B-200K-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.1354
-- Evaluation perplexity: 6.4013
-- Evaluation perplexity: 7.5029
LoneStriker_Nous-Capybara-34B-3.0bpw-h6-exl2
-- Evaluation perplexity: 7.4431
-- Evaluation perplexity: 7.8713
-- Evaluation perplexity: 10.0094
airo-3.0bpw
-- Evaluation perplexity: 6.6336
-- Evaluation perplexity: 6.9046
-- Evaluation perplexity: 8.1336
fastchat-3.0bpw
-- Evaluation perplexity: 6.3284
-- Evaluation perplexity: 6.5742
-- Evaluation perplexity: 9.6255
RPmerge-31bpw
-- Evaluation perplexity: 6.6463
-- Evaluation perplexity: 6.9070
-- Evaluation perplexity: 13.3217
DrNicefellow_ChatAllInOne-Yi-34B-200K-V1-3.0bpw
-- Evaluation perplexity: 6.4106
-- Evaluation perplexity: 6.6542
-- Evaluation perplexity: 7.6800

This model's perplexity is quite low! And it seems to have kept its long context performance. @Nexesenex you may be interested in this.

Anyway, going to actually try this out later, but just dropping this to say the model seems promising.

Nice catch, Bruce! I'll bench its Q4_K_M tonight.

Edit : Done !

ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag,82.5,,400,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag,81.8,,1000,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag_Bin,77,,400,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag_Bin,80.9,,1000,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Arc-Challenge,56.18729097,,299,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Arc-Easy,79.29824561,,570,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,MMLU,38.33865815,,313,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Thruthful-QA,31.08935129,,817,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Winogrande,77.5848,,1267,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,5.0197,512,512,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,4.3070,4096,4096,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,4.3596,8192,8192,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,

Not bad indeed, but not a base like Tess M 1.0.
Nevertheless, if ChatAllInOne's dataset is unique, that's worth a try!

Thanks for sharing this, DrNicefellow!

Sign up or log in to comment