What Works
Loader | Loading 1 LoRA | Loading 2 or more LoRAs | Training LoRAs | Multimodal extension | Perplexity evaluation |
---|---|---|---|---|---|
Transformers | β | β *** | β * | β | β |
ExLlama_HF | β | β | β | β | β |
ExLlamav2_HF | β | β | β | β | β |
ExLlama | β | β | β | β | use ExLlama_HF |
ExLlamav2 | β | β | β | β | use ExLlamav2_HF |
AutoGPTQ | β | β | β | β | β |
GPTQ-for-LLaMa | β ** | β *** | β | β | β |
llama.cpp | β | β | β | β | use llamacpp_HF |
llamacpp_HF | β | β | β | β | β |
ctransformers | β | β | β | β | β |
AutoAWQ | ? | β | ? | ? | β |
β = not implemented
β = implemented
* Training LoRAs with GPTQ models also works with the Transformers loader. Make sure to check "auto-devices" and "disable_exllama" before loading the model.
** Requires the monkey-patch. The instructions can be found here.
*** Multi-LoRA in PEFT is tricky and the current implementation does not work reliably in all cases.