This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.
ngxson/gguf_lora_collection
Join the community of Machine Learners and AI enthusiasts.
Sign UpWith my llama-cpp-python (0.3.4), the following PR maybe have not been merged yet, so an error occurs when applying LoRA. I tried it with Qwen 2.5 14B Instruct. Well, it will be updated eventually.π
https://github.com/ggerganov/llama.cpp/issues/9114
This is super cool!!! Would you mind sharing the process of these GGUF LoRA adapters? Did you convert the LoRA into GGUF or made LoRA from the GGUF itself?
Yes, sure!
The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora
to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit
Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co/spaces/ggml-org/gguf-my-lora
Then it's good to go!
Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)