Text Generation
GGUF
English
ggml
quantized
q2_k
q3_k_m
q4_k_m
q5_k_m
q6_k
q8_0
Edit model card

Felladrin/Llama-68M-Chat-v1-GGUF

Quantized GGUF model files for Llama-68M-Chat-v1 from Felladrin

Name Quant method Size
llama-68m-chat-v1.fp16.gguf fp16 136.79 MB
llama-68m-chat-v1.q2_k.gguf q2_k 35.88 MB
llama-68m-chat-v1.q3_k_m.gguf q3_k_m 40.66 MB
llama-68m-chat-v1.q4_k_m.gguf q4_k_m 46.10 MB
llama-68m-chat-v1.q5_k_m.gguf q5_k_m 51.16 MB
llama-68m-chat-v1.q6_k.gguf q6_k 56.54 MB
llama-68m-chat-v1.q8_0.gguf q8_0 73.02 MB

Original Model Card:

A Llama Chat Model of 68M Parameters

Recommended Prompt Format

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

Recommended Inference Parameters

penalty_alpha: 0.5
top_k: 4
Downloads last month
96
GGUF
+1
Inference Examples
Inference API (serverless) has been turned off for this model.

Quantized from

Datasets used to train afrideva/Llama-68M-Chat-v1-GGUF