This model was a QLoRA of LLaMA 2-13B base finetuned on the FIMFiction archive and then merged with the base model (as most GGML loading apps don't support LoRAs), and quantized for llama.cpp-based frontends. Was trained with 1024 context length
There are two options, depending on the resources you have:
- Q5_K_M: Low quality loss K-quantized 5 bits model. Max RAM consumption is 11.73 GB, recommended if you have 12GB of VRAM to load 40 layers
- Q4_K_S: Compact K-quantized 4 bits. Max RAM consumption is 9.87 GB
This not an instruction tuned model, it was trained on raw text, so treat it like an autocomplete. Seems sensitive to formatting: I found it's usually better at staying on topic when using double spacing in the prompt.
Only tags excluded from the dataset are eqg and humanized.
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.