GGUF

#1
by KatyTheCutie - opened

Can you please quant this model into a GGUF format?

Hey thanks for noticing the model. I am not sure this model is working good or not. Please try this model and tell me how things can be improved.
The stopping criteria needs to be defined in order to get correct response.

here is the colab notebook to try:

https://colab.research.google.com/drive/1vS5MF2WNXtXMKNDXFua0T43l7HJ51nOW?usp=sharing

Thank you! I'll have a look at it and give you feedback.

So far the model is pretty good when responding, The model obviously makes up things sometimes, that will likely happen with a smaller model, but so far I'd say this model is actually BETTER than TinyLlama 1.0

I personally don't know how to quantize this.
Hopefully, someone will if more people find it interesting or useful.

I'm uploading a quant now! I figured it out

KatyTheCutie changed discussion status to closed
TinyPixel changed discussion status to open

Hey i have new experimental fine-tune check it out
TinyPixel/qwen-1.8B-OrcaMini
Its only fine tuned on 1k examples but it's good

I'll test it soon! Would you like me to make a GGUF of it?

Just test it for now it also uses system prompts and good at following it

The OrcaMini one is quite good at following the system prompt as you said! Thank you for creating it! I'll upload a GGUF of it soon! πŸ’›

Have you tried running the gguf. Please share the code if you can.

OrcaMini will be much better if tuned on the whole dataset.

I have! You can run the GGUF with llama.cpp server or KoboldCPP. I still need to upload the GGUF.

You can use llamafile to also easily run GGUF's on most operating systems
https://github.com/Mozilla-Ocho/llamafile/releases

./llamafile-0.5 -m qwen-1.8b-guanaco-Q8_0.gguf

It should open a webUI server at 127.0.0.1:8080

GGUF uploaded.

KatyTheCutie changed discussion status to closed

Sign up or log in to comment