Which base model?

by brucethemoose - opened Dec 3, 2023

Discussion

brucethemoose

Dec 3, 2023

Is this trained on the regular Yi 34B base, or the 200K version?

brucethemoose

Dec 3, 2023

•

edited Dec 3, 2023

Also, yeah, Yi is still bugged on gguf without workarounds, I think.

And its runs hot even if worked around. Low temp MinP or low temperature/tau Mirostat seem to be better.

Also, be sure to train on https://huggingface.co/chargoddard/Yi-34B-200K-Llama or any of chargoddard's uploads if you aren't already. The other llamafied versions are problematic.

Sao10K

Owner Dec 3, 2023

https://huggingface.co/chargoddard/Yi-34B-Llama

Yeah I trained on this one. Results seemed fine on fp16 so I felt it was good enough to upload.

brucethemoose

Dec 3, 2023

•

edited Dec 3, 2023

I would highly recommend the 200K version, if you ever come back to Yi. Even the base model particularly good at referencing story detail from its long context, and you can run 45-75K on 24GB depending on how much you quantize it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment