Low perplexity!

by brucethemoose - opened Feb 3

Feb 3

I ran perplexity tests on a bunch of 3bpw quantizations on a novel-style dataset at context sizes of 20,000, 10,000, and 2,000, respectively. So the top most quantization in each row is 20K context:

v7-exl2-31bpw-fiction
-- Evaluation perplexity: 6.5185
-- Evaluation perplexity: 6.7835
-- Evaluation perplexity: 38.2186
LoneStriker_Thespis-34b-DPO-v0.7-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.6764
-- Evaluation perplexity: 6.9829
-- Evaluation perplexity: 8.3438
LoneStriker_Tess-34B-v1.5b-3.0bpw-h6-exl2
-- Evaluation perplexity: 8.4026
-- Evaluation perplexity: 8.7730
-- Evaluation perplexity: 11.4197
LoneStriker_deepmoney-34b-200k-base-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.4025
-- Evaluation perplexity: 6.6599
-- Evaluation perplexity: 7.7988
LoneStriker_bagel-34b-v0.2-3.0bpw-h6-exl2
-- Evaluation perplexity: 49.0813
-- Evaluation perplexity: 45.8001
-- Evaluation perplexity: 18.7098
smaug-3.0bpw
-- Evaluation perplexity: 32.4581
-- Evaluation perplexity: 27.7181
-- Evaluation perplexity: 11.0142
LoneStriker_Yi-34B-200K-DARE-megamerge-v8-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.7946
-- Evaluation perplexity: 7.0572
-- Evaluation perplexity: 8.1391
LoneStriker_Tess-M-v1.0-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.1975
-- Evaluation perplexity: 6.4421
-- Evaluation perplexity: 7.5398
LoneStriker_Pallas-0.5-3.0bpw-h6-exl2
-- Evaluation perplexity: 8.9354
-- Evaluation perplexity: 9.4341
-- Evaluation perplexity: 17.5678
LoneStriker_Yi-34B-200K-AEZAKMI-v2-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.8550
-- Evaluation perplexity: 7.1553
-- Evaluation perplexity: 8.4838
LoneStriker_Yi-34B-200K-3.0bpw-h6-exl2
-- Evaluation perplexity: 6.1354
-- Evaluation perplexity: 6.4013
-- Evaluation perplexity: 7.5029
LoneStriker_Nous-Capybara-34B-3.0bpw-h6-exl2
-- Evaluation perplexity: 7.4431
-- Evaluation perplexity: 7.8713
-- Evaluation perplexity: 10.0094
airo-3.0bpw
-- Evaluation perplexity: 6.6336
-- Evaluation perplexity: 6.9046
-- Evaluation perplexity: 8.1336
fastchat-3.0bpw
-- Evaluation perplexity: 6.3284
-- Evaluation perplexity: 6.5742
-- Evaluation perplexity: 9.6255
RPmerge-31bpw
-- Evaluation perplexity: 6.6463
-- Evaluation perplexity: 6.9070
-- Evaluation perplexity: 13.3217
DrNicefellow_ChatAllInOne-Yi-34B-200K-V1-3.0bpw
-- Evaluation perplexity: 6.4106
-- Evaluation perplexity: 6.6542
-- Evaluation perplexity: 7.6800

This model's perplexity is quite low! And it seems to have kept its long context performance. @Nexesenex you may be interested in this.

Anyway, going to actually try this out later, but just dropping this to say the model seems promising.

Nexesenex

Feb 3

•

edited Feb 3

Nice catch, Bruce! I'll bench its Q4_K_M tonight.

Edit : Done !

ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag,82.5,,400,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag,81.8,,1000,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag_Bin,77,,400,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Hellaswag_Bin,80.9,,1000,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Arc-Challenge,56.18729097,,299,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Arc-Easy,79.29824561,,570,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,MMLU,38.33865815,,313,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Thruthful-QA,31.08935129,,817,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,Winogrande,77.5848,,1267,2024-02-03 05:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,5.0197,512,512,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,4.3070,4096,4096,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,
ChatAllInOne-Yi-34B-200K-V1-unsloth.Q4_K_M.gguf,-,wikitext,4.3596,8192,8192,2024-02-03 01:40:00,,34b,Yi,200000,,,GGUF,DrNicefellow,DrNicefellow,

Not bad indeed, but not a base like Tess M 1.0.
Nevertheless, if ChatAllInOne's dataset is unique, that's worth a try!

Thanks for sharing this, DrNicefellow!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment