Thank you for this nice model. Could you make a q8 gguf, please?

by NikolayKozloff - opened Apr 23

Discussion

NikolayKozloff

Apr 23

...

ewre324

Apr 24

You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion

Whichever quantization you want you can replace the corresponding False to True.

Save to 8bit Q8_0

if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

Save to 16bit GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

Save to q4_k_m GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

ewre324

Apr 24

The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.

shimmyshimmer

Unsloth AI org Apr 24

You can use the sample colab sheets shared to convert the models to gguf. Unsloth uses Llama.cpp to convert the models. The below code will do the conversion

Whichever quantization you want you can replace the corresponding False to True.

Save to 8bit Q8_0

if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

Save to 16bit GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

Save to q4_k_m GGUF

if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

Thanks for helping out as always ewre! ❤️

The free version of colab (T4 GPU) is taking about 20 minutes to build the GGUF file.

You can also try our Kaggle notebooks which provides 30 hours for free per week: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook

akumaburn

Apr 24

@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF

NikolayKozloff

Apr 25

•

edited Apr 25

@NikolayKozloff Here it is, in case you or anyone else is still looking for it: https://huggingface.co/akumaburn/llama-3-8b-bnb-4bit-GGUF

Thanks. Your gguf made possible to merge it with lora and that resulted in creation of probably first Albanian llm with acceptable quality in chatting: https://huggingface.co/NikolayKozloff/bleta-8B-v0.5-Albanian-shqip-GGUF

narsisfa

May 1

its greate job. tanks. how to fine tune with my custum data?

benAG7

17 days ago

•

edited 17 days ago

"I'm encountering the following problem:
When I fine-tune an LLM using one of your Colab codes, I get a model that gives good answers in the editors.
But when I save it in GGUF format with llama.cp and push it to my Hugging Face repo, then download and use it in LMStudio, the model fails to answer any questions, it bugs out, doesn't work at all, and freezes.
Note that the output format gives me a 16GB file for a Llama3 7B, while the GGUF models in LMStudio are 5GB to 7GB.

Here's the part of the code that saves:
[

Save to 8bit Q8_0

if False: model.save_pretrained_gguf("model", tokenizer,) #if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

Save to 16bit GGUF

if False: model.save_pretrained_gguf("Llama3_7B_finetuned_lora_f16", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("Llama3_7B_finetuned_lora_f16", tokenizer, quantization_method = "f16", token = "")

Save to q4_k_m GGUF

if False: model.save_pretrained_gguf("Llama3_7B_finetuned_lora_q4_k_m", tokenizer, quantization_method = "q4_k_m") model.push_to_hub_gguf("Llama3_7B_finetuned_lora_q4_k_m", tokenizer, quantization_method = "q4_k_m", token = "")]

Please tell me how to save with a reasonable file size that can work correctly locally.
Thank you."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment