Edit model card

SmolLM-135M-Instruct

Original Model

HuggingFaceTB/SmolLM-135M-Instruct

Run with LlamaEdge

  • LlamaEdge version: v0.12.5 and above

  • Prompt template

    • Prompt type: chatml

    • Prompt string

      <|im_start|>system
      {system_message}<|im_end|>
      <|im_start|>user
      {prompt}<|im_end|>
      <|im_start|>assistant
      
  • Context size: 2048

  • Run as LlamaEdge service

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:SmolLM-135M-Instruct-Q5_K_M.gguf \
      llama-api-server.wasm \
      --prompt-template chatml \
      --ctx-size 2048 \
      --model-name SmolLM-135M-Instruct
    
  • Run as LlamaEdge command app

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:SmolLM-135M-Instruct-Q5_K_M.gguf \
      llama-chat.wasm \
      --prompt-template chatml \
      --ctx-size 2048
    

Quantized GGUF Models

Name Quant method Bits Size Use case
SmolLM-135M-Instruct-Q2_K.gguf Q2_K 2 88.2 MB smallest, significant quality loss - not recommended for most purposes
SmolLM-135M-Instruct-Q3_K_L.gguf Q3_K_L 3 97.5 MB small, substantial quality loss
SmolLM-135M-Instruct-Q3_K_M.gguf Q3_K_M 3 93.5 MB very small, high quality loss
SmolLM-135M-Instruct-Q3_K_S.gguf Q3_K_S 3 88.2 MB very small, high quality loss
SmolLM-135M-Instruct-Q4_0.gguf Q4_0 4 91.7 MB legacy; small, very high quality loss - prefer using Q3_K_M
SmolLM-135M-Instruct-Q4_K_M.gguf Q4_K_M 4 105 MB medium, balanced quality - recommended
SmolLM-135M-Instruct-Q4_K_S.gguf Q4_K_S 4 102 MB small, greater quality loss
SmolLM-135M-Instruct-Q5_0.gguf Q5_0 5 105 MB legacy; medium, balanced quality - prefer using Q4_K_M
SmolLM-135M-Instruct-Q5_K_M.gguf Q5_K_M 5 112 MB large, very low quality loss - recommended
SmolLM-135M-Instruct-Q5_K_S.gguf Q5_K_S 5 110 MB large, low quality loss - recommended
SmolLM-135M-Instruct-Q6_K.gguf Q6_K 6 138 MB very large, extremely low quality loss
SmolLM-135M-Instruct-Q8_0.gguf Q8_0 8 145 MB very large, extremely low quality loss - not recommended
SmolLM-135M-Instruct-f16.gguf f16 16 271 MB

Quantized with llama.cpp b3445.

Downloads last month
90
GGUF
Model size
135M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for second-state/SmolLM-135M-Instruct-GGUF

Quantized
(26)
this model