thirteenbit's picture
Update README.md
7abb801 verified
|
raw
history blame
1.65 kB
metadata
base_model: google/madlad400-10b-mt
inference: false
license: apache-2.0
model_name: madlad400-10b-mt-gguf
pipeline_tag: translation

MADLAD-400-10B-MT - GGUF

Description

This repo contains GGUF format model files for MADLAD-400-10B-MT for use with llama.cpp and compatible software.

Converted to gguf using llama.cpp convert_hf_to_gguf.py and quantized using llama.cpp llama-quantize, llama.cpp version b3325.

Provided files

Name Quant method Bits Size VRAM required
model-q3_k_m.gguf Q3_K_M 3 4.9 GB 5.7 GB
model-q4_k_m.gguf Q4_K_M 4 6.3 GB 7.1 GB
model-q5_k_m.gguf Q5_K_M 5 7.2 GB 7.9 GB
model-q6_k.gguf Q6_K 6 8.2 GB 8.9 GB
model-q8_0.gguf Q8_0 8 11 GB 11.3 GB

Note: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.