Edit model card

h2o-danube2-1.8b-chat-GGUF

Description

This repo contains GGUF format model files for h2o-danube2-1.8b-chat quantized using llama.cpp framework.

Table below summarizes different quantized versions of h2o-danube2-1.8b-chat. It shows the trade-off between size, speed and quality of the models.

Name Quant method Model size MT-Bench AVG Perplexity Tokens per second
h2o-danube2-1.8b-chat-F16.gguf F16 3.66 GB 5.60 8.02 797
h2o-danube2-1.8b-chat-Q8_0.gguf Q8_0 1.95 GB 5.51 8.02 1156
h2o-danube2-1.8b-chat-Q6_K.gguf Q6_K 1.50 GB 5.51 8.03 1131
h2o-danube2-1.8b-chat-Q5_K_M.gguf Q5_K_M 1.30 GB 5.56 8.10 1172
h2o-danube2-1.8b-chat-Q5_K_S.gguf Q5_K_S 1.27 GB 5.49 8.12 1107
h2o-danube2-1.8b-chat-Q4_K_M.gguf Q4_K_M 1.11 GB 5.60 8.27 1162
h2o-danube2-1.8b-chat-Q4_K_S.gguf Q4_K_S 1.06 GB 5.59 8.34 1270
h2o-danube2-1.8b-chat-Q3_K_L.gguf Q3_K_L 0.98 GB 5.23 8.72 1442
h2o-danube2-1.8b-chat-Q3_K_M.gguf Q3_K_M 0.91 GB 4.91 8.81 1107
h2o-danube2-1.8b-chat-Q3_K_S.gguf Q3_K_S 0.82 GB 4.03 10.12 1103
h2o-danube2-1.8b-chat-Q2_K.gguf Q2_K 0.71 GB 3.03 12.56 1160

Columns in the table are:

  • Name -- model name and link
  • Quant method -- quantization method
  • Model size -- size of the model in gigabytes
  • MT-Bench AVG -- MT-Bench benchmark score. The score is from 1 to 10, the higher, the better
  • Perplexity -- perplexity metric on WikiText-2 dataset. It's reported in a perplexity test from llama.cpp. The lower, the better
  • Tokens per second -- generation speed in tokens per second, as reported in a perplexity test from llama.cpp. The higher, the better. Speed tests are done on a single H100 GPU

Prompt template

<|prompt|>Why is drinking water so healthy?</s><|answer|>
Downloads last month
2,265
GGUF
Model size
1.83B params
Architecture
llama

Collection including h2oai/h2o-danube2-1.8b-chat-GGUF