|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- merge |
|
base_model: |
|
- state-spaces/mamba-130m |
|
- state-spaces/mamba-370m |
|
- state-spaces/mamba-790m |
|
- state-spaces/mamba-1.4b |
|
- state-spaces/mamba-2.8b |
|
- state-spaces/mamba-2.8b-slimpj |
|
--- |
|
|
|
# Mamba GGUF |
|
|
|
These are the Mamba base models, converted to GGUF for use with [llama.cpp](https://github.com/ggerganov/llama.cpp), in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit). |
|
|
|
Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "`📦LFS ` ` ↓`" button next to your desired quantization. |
|
|
|
Here is a table adapted from [TheBloke](https://huggingface.co/TheBloke) explaining the various precisions: |
|
|
|
| Quant method | Use case | |
|
| ---- | ---- | |
|
| Q2_K | significant quality loss - not recommended for most purposes | |
|
| Q3_K_S | very small, high quality loss | |
|
| Q3_K_M | very small, high quality loss | |
|
| Q3_K_L | small, substantial quality loss | |
|
| Q4_0 | legacy; small, very high quality loss - prefer using Q3_K_M | |
|
| Q4_K_S | small, greater quality loss | |
|
| Q4_K_M | medium, balanced quality - recommended | |
|
| Q5_0 | legacy; medium, balanced quality - prefer using Q4_K_M | |
|
| Q5_K_S | large, low quality loss - recommended | |
|
| Q5_K_M | large, very low quality loss - recommended | |
|
| Q6_K | very large, extremely low quality loss | |
|
| Q8_0 | very large, extremely low quality loss - not recommended | |
|
| F16 | half precision - almost identical to the original | |
|
| F32 | original precision - recommended by the Mamba authors | |