devingulliver
/

mamba-gguf

Text Generation

Inference Endpoints

Model card Files Files and versions Community

mamba-gguf / README.md

devingulliver's picture

Try to fix metadata

215dc35 verified 8 months ago

|

1.55 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- merge
	base_model:
	- state-spaces/mamba-130m
	- state-spaces/mamba-370m
	- state-spaces/mamba-790m
	- state-spaces/mamba-1.4b
	- state-spaces/mamba-2.8b
	- state-spaces/mamba-2.8b-slimpj
	---

	# Mamba GGUF

	These are the Mamba base models, converted to GGUF for use with [llama.cpp](https://github.com/ggerganov/llama.cpp), in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).

	Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "`📦LFS ` ` ↓`" button next to your desired quantization.

	Here is a table adapted from [TheBloke](https://huggingface.co/TheBloke) explaining the various precisions:

	\| Quant method \| Use case \|
	\| ---- \| ---- \|
	\| Q2_K \| significant quality loss - not recommended for most purposes \|
	\| Q3_K_S \| very small, high quality loss \|
	\| Q3_K_M \| very small, high quality loss \|
	\| Q3_K_L \| small, substantial quality loss \|
	\| Q4_0 \| legacy; small, very high quality loss - prefer using Q3_K_M \|
	\| Q4_K_S \| small, greater quality loss \|
	\| Q4_K_M \| medium, balanced quality - recommended \|
	\| Q5_0 \| legacy; medium, balanced quality - prefer using Q4_K_M \|
	\| Q5_K_S \| large, low quality loss - recommended \|
	\| Q5_K_M \| large, very low quality loss - recommended \|
	\| Q6_K \| very large, extremely low quality loss \|
	\| Q8_0 \| very large, extremely low quality loss - not recommended \|
	\| F16 \| half precision - almost identical to the original \|
	\| F32 \| original precision - recommended by the Mamba authors \|