Instructions to use QuantLLM/SmolLM2-135M-QuantLLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantLLM/SmolLM2-135M-QuantLLM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantLLM/SmolLM2-135M-QuantLLM")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("QuantLLM/SmolLM2-135M-QuantLLM")
model = AutoModelForCausalLM.from_pretrained("QuantLLM/SmolLM2-135M-QuantLLM")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QuantLLM/SmolLM2-135M-QuantLLM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantLLM/SmolLM2-135M-QuantLLM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantLLM/SmolLM2-135M-QuantLLM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/QuantLLM/SmolLM2-135M-QuantLLM

SGLang

How to use QuantLLM/SmolLM2-135M-QuantLLM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantLLM/SmolLM2-135M-QuantLLM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantLLM/SmolLM2-135M-QuantLLM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantLLM/SmolLM2-135M-QuantLLM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantLLM/SmolLM2-135M-QuantLLM",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use QuantLLM/SmolLM2-135M-QuantLLM with Docker Model Runner:
```
docker model run hf.co/QuantLLM/SmolLM2-135M-QuantLLM
```

SmolLM2-135M-QuantLLM / README.md

codewithdark

Add model card

4711922 verified 12 days ago

preview code

raw

history blame contribute delete

3.69 kB

metadata

license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M
library_name: transformers
language:
  - en
tags:
  - quantllm
  - transformers
  - safetensors
pipeline_tag: text-generation

🤗 SmolLM2-135M-QuantLLM

HuggingFaceTB/SmolLM2-135M converted to SAFETENSORS format

⭐ Star QuantLLM on GitHub

📖 About This Model

This model is HuggingFaceTB/SmolLM2-135M converted to SafeTensors format for use with HuggingFace Transformers and PyTorch.

Property	Value
Base Model	HuggingFaceTB/SmolLM2-135M
Format	SAFETENSORS
Quantization	None (Full Precision)
License	apache-2.0
Created With	QuantLLM

🚀 Quick Start

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")
tokenizer = AutoTokenizer.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")

# Generate text
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With QuantLLM

from quantllm import TurboModel

# Load with automatic optimization
model = TurboModel.from_pretrained("codewithdark/SmolLM2-135M-QuantLLM")

# Generate
response = model.generate("Write a poem about coding")
print(response)

Requirements

pip install transformers torch

📊 Model Details

Property	Value
Original Model	HuggingFaceTB/SmolLM2-135M
Format	SAFETENSORS
Quantization	Full Precision
License	`apache-2.0`
Export Date	2026-04-29
Exported By	QuantLLM v2.1

🚀 Created with QuantLLM

Convert any model to GGUF, ONNX, or MLX in one line!

from quantllm import turbo

# Load any HuggingFace model
model = turbo("HuggingFaceTB/SmolLM2-135M")

# Export to any format
model.export("safetensors", quantization="Q4_K_M")

# Push to HuggingFace
model.push("your-repo", format="safetensors")

📚 Documentation · 🐛 Report Issue · 💡 Request Feature

📊 Export Details

Exported with QuantLLM from HuggingFaceTB/SmolLM2-135M (134.5M params).

Property	Value
Format	SafeTensors
Size	541.6 MB
Parameters	134.5M
Dtype	float32