Text Generation
GGUF
English
smol_llama
llama2
ggml
quantized
q2_k
q3_k_m
q4_k_m
q5_k_m
q6_k
q8_0
Edit model card

BEE-spoke-data/smol_llama-220M-GQA-GGUF

Quantized GGUF model files for smol_llama-220M-GQA from BEE-spoke-data

Name Quant method Size
smol_llama-220m-gqa.fp16.gguf fp16 436.50 MB
smol_llama-220m-gqa.q2_k.gguf q2_k 102.60 MB
smol_llama-220m-gqa.q3_k_m.gguf q3_k_m 115.70 MB
smol_llama-220m-gqa.q4_k_m.gguf q4_k_m 137.58 MB
smol_llama-220m-gqa.q5_k_m.gguf q5_k_m 157.91 MB
smol_llama-220m-gqa.q6_k.gguf q6_k 179.52 MB
smol_llama-220m-gqa.q8_0.gguf q8_0 232.28 MB

Original Model Card:

smol_llama: 220M GQA

model card WIP, more details to come

A small 220M param (total) decoder model. This is the first version of the model.

  • 1024 hidden size, 10 layers
  • GQA (32 heads, 8 key-value), context length 2048
  • train-from-scratch on one GPU :)

Downloads last month
96
GGUF
+1
Inference Examples
Inference API (serverless) has been turned off for this model.

Quantized from

Datasets used to train afrideva/smol_llama-220M-GQA-GGUF