GGUF? Finetune using different base model?

#4
by ThiloteE - opened

This model is interesting :-)
I think many more people might try your models, if you provide quantized GGUFs.
Also, have you considered using a different base model with larger context length, such as mistral 7b? It has the Apache license, so there are no problems on that front.

Owner

This was made back when TheBloke was doing quants, so here is the GGUF that he made: https://huggingface.co/TheBloke/Inkbot-13B-8k-0.2-GGUF

And yeah, I have been super busy with work these past few months which stopped my progress of fine tuning on different base models. I just got back to fixing my data pipeline that had bit-rotted, and will be looking at doing a Mistral 7b fine tune soon.

Hope the model works well for your use case. I'll update here when I do get another base model trained.

Thank you so much! Something with larger context should go really well with this model. Lots of movement right now in that regard. Yi has updated their 34b to 200k context window with great results for needle in the haystack benchmarks and Command-R seems to have a context of 125k as well! I just wonder, how many parameters a model needs to reliably summarize text. I saw a paper from a year ago that talked about "emerging capabilities of LLMs". Not sure, if 7b is enough, but well worth an attempt!

Owner

Yeah, I will definitely be looking at doing larger models too. I was in the process of doing a ~30b model when I got sidetracked.

I'm glad that the context issues have pretty much been solved at this point.

Yi-9b-200k just dropped!

Sign up or log in to comment