Which base model?

#1
by brucethemoose - opened

Is this trained on the regular Yi 34B base, or the 200K version?

Also, yeah, Yi is still bugged on gguf without workarounds, I think.

And its runs hot even if worked around. Low temp MinP or low temperature/tau Mirostat seem to be better.

Also, be sure to train on https://huggingface.co/chargoddard/Yi-34B-200K-Llama or any of chargoddard's uploads if you aren't already. The other llamafied versions are problematic.

https://huggingface.co/chargoddard/Yi-34B-Llama

Yeah I trained on this one. Results seemed fine on fp16 so I felt it was good enough to upload.

I would highly recommend the 200K version, if you ever come back to Yi. Even the base model particularly good at referencing story detail from its long context, and you can run 45-75K on 24GB depending on how much you quantize it.

Sign up or log in to comment