Thank you for this nice model. Could you make a q8 gguf, please?

#2
by NikolayKozloff - opened

You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion

Whichever quantization you want you can replace the corresponding False to True.

Save to 8bit Q8_0

if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

Save to 16bit GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

Save to q4_k_m GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.

Unsloth AI org

You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion

Whichever quantization you want you can replace the corresponding False to True.

Save to 8bit Q8_0

if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

Save to 16bit GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

Save to q4_k_m GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

Thanks for helping out as always ewre! ❤️

The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.

You can also try our Kaggle notebooks which provides 30 hours for free per week: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook

@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF

@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF

Thanks. Your gguf made possible to merge it with lora and that resulted in creation of probably first Albanian llm with acceptable quality in chatting: https://huggingface.co/NikolayKozloff/bleta-8B-v0.5-Albanian-shqip-GGUF

its greate job. tanks. how to fine tune with my custum data?

Sign up or log in to comment