MADLAD-400-10B-MT - GGUF

Description

This repo contains GGUF format model files for MADLAD-400-10B-MT for use with llama.cpp and compatible software.

Converted to gguf using llama.cpp convert_hf_to_gguf.py and quantized using llama.cpp llama-quantize, llama.cpp version b3325.

Provided files

Name Quant method Bits Size VRAM required
model-q3_k_m.gguf Q3_K_M 3 4.9 GB 5.7 GB
model-q4_k_m.gguf Q4_K_M 4 6.3 GB 7.1 GB
model-q5_k_m.gguf Q5_K_M 5 7.2 GB 7.9 GB
model-q6_k.gguf Q6_K 6 8.2 GB 8.9 GB
model-q8_0.gguf Q8_0 8 11 GB 11.3 GB

Note: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.

Downloads last month
107
GGUF
Model size
10.7B params
Architecture
t5
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for thirteenbit/madlad400-10b-mt-gguf

Quantized
(8)
this model