Text Generation
Transformers
GGUF
English
stablelm
causal-lm
Inference Endpoints
Edit model card

StableLM-2-Zephyr-1.6B-GGUF

Original Model

stabilityai/stablelm-2-zephyr-1_6b

Run with LlamaEdge

  • LlamaEdge version: v0.2.9 and above

  • Prompt template

    • Prompt type: stablelm-zephyr

    • Prompt string

      <|user|>
      {prompt}<|endoftext|>
      <|assistant|>
      
    • Reverse prompt: <|endoftext|>

  • Context size: 2048

  • Run as LlamaEdge service

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:stablelm-2-zephyr-1_6b-Q5_K_M.gguf llama-api-server.wasm -p stablelm-zephyr -r '<|endoftext|>' -c 1024
    
  • Run as LlamaEdge command app

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:stablelm-2-zephyr-1_6b-Q5_K_M.gguf llama-chat.wasm -p stablelm-zephyr -r '<|endoftext|>' --temp 0.5 -c 1024
    

Quantized GGUF Models

Name Quant method Bits Size Use case
stablelm-2-zephyr-1_6b-Q2_K.gguf Q2_K 2 694 MB smallest, significant quality loss - not recommended for most purposes
stablelm-2-zephyr-1_6b-Q3_K_L.gguf Q3_K_L 3 915 MB small, substantial quality loss
stablelm-2-zephyr-1_6b-Q3_K_M.gguf Q3_K_M 3 858 MB very small, high quality loss
stablelm-2-zephyr-1_6b-Q3_K_S.gguf Q3_K_S 3 792 MB very small, high quality loss
stablelm-2-zephyr-1_6b-Q4_0.gguf Q4_0 4 983 MB legacy; small, very high quality loss - prefer using Q3_K_M
stablelm-2-zephyr-1_6b-Q4_K_M.gguf Q4_K_M 4 1.03 GB medium, balanced quality - recommended
stablelm-2-zephyr-1_6b-Q4_K_S.gguf Q4_K_S 4 989 MB small, greater quality loss
stablelm-2-zephyr-1_6b-Q5_0.gguf Q5_0 5 1.16 GB legacy; medium, balanced quality - prefer using Q4_K_M
stablelm-2-zephyr-1_6b-Q5_K_M.gguf Q5_K_M 5 1.19 GB large, very low quality loss - recommended
stablelm-2-zephyr-1_6b-Q5_K_S.gguf Q5_K_S 5 1.16 GB large, low quality loss - recommended
stablelm-2-zephyr-1_6b-Q6_K.gguf Q6_K 6 1.35 GB very large, extremely low quality loss
stablelm-2-zephyr-1_6b-Q8_0.gguf Q8_0 8 1.75 GB very large, extremely low quality loss - not recommended
Downloads last month
982
GGUF
Model size
1.64B params
Architecture
stablelm
Inference API
This model can be loaded on Inference API (serverless).

Quantized from

Datasets used to train second-state/stablelm-2-zephyr-1.6b-GGUF

Spaces using second-state/stablelm-2-zephyr-1.6b-GGUF 2