i like -alt, what's the difference?

#3
by ProphetOfBostrom - opened

i checked the descriptions line by line but I can't promise i didn't just blank it anyway.
downloading 140gb to test is too much. have you ever tried compressing these? zstd fares well (it's fast hint hint) and zpaq absolutely condenses 10 figure param fp16s
wish hf would just do the .safetensors.tar.zst themselves

commentary (on -alt, confusingly)

i had a nice time with 2.4 exl2. smooth sampling (0.3~ for temps 1-1.5, 0.2 for temp=2) really just makes it seem like a better writer. probably because it can't just imitate my style : - )

was using sillytaven and i may have compiled exllamav2 with fastmath for the quant and/or inference

So thanks. I do think the 8b is a bit more fun out of the box with normal settings. but it's never gonna be smarter.

NeverSleep org

"Alt" variant was trained on one epoch more, since Ikari and me wasn't able to be okay on the one we would release, we released the two.

Sign up or log in to comment