This is a merge of TheBloke/MythoMax-L2-13B-GGUF and the LORA pxdde/altcb.

It can directly be used for inference using CPU+GPU also on low VRAM.

I am able to offload 18 layers to GPU on a RTX 3060 Ti (8GB)

GGUF

Model size

13B params

Architecture

llama

5-bit

Unable to determine this model's library. Check the docs .