Thank you for this nice model. Could you make a q8 gguf, please?
...
You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion
Whichever quantization you want you can replace the corresponding False to True.
Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")
Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.
You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion
Whichever quantization you want you can replace the corresponding False to True.
Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
Thanks for helping out as always ewre! ❤️
The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.
You can also try our Kaggle notebooks which provides 30 hours for free per week: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook
@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF
@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF
Thanks. Your gguf made possible to merge it with lora and that resulted in creation of probably first Albanian llm with acceptable quality in chatting: https://huggingface.co/NikolayKozloff/bleta-8B-v0.5-Albanian-shqip-GGUF
its greate job. tanks. how to fine tune with my custum data?