Instructions to use Surpem/Supertron2.1-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Surpem/Supertron2.1-0.6B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Surpem/Supertron2.1-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron2.1-0.6B")
model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron2.1-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Surpem/Supertron2.1-0.6B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Surpem/Supertron2.1-0.6B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2.1-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Surpem/Supertron2.1-0.6B

SGLang

How to use Surpem/Supertron2.1-0.6B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Surpem/Supertron2.1-0.6B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2.1-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Surpem/Supertron2.1-0.6B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2.1-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Surpem/Supertron2.1-0.6B with Docker Model Runner:
```
docker model run hf.co/Surpem/Supertron2.1-0.6B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Supertron2.1-0.6B: A Compact, Efficient Instruction-Tuned Language Model

Model Description

Supertron2.1-0.6B is an instruction-tuned language model built on top of Qwen3-0.6B. It is designed to be a small, efficient daily-driver model for reasoning, math, coding, general knowledge, writing, and assistant-style conversation while remaining lightweight enough to run on consumer hardware.

The model keeps the Qwen3 architecture, tokenizer, and chat format, which makes it easy to use with standard transformers workflows. Supertron2.1-0.6B is intended for users who want a compact generalist model that can answer questions, explain concepts, write code, solve structured problems, and follow natural language instructions.

Developed by: Surpem
Model type: Causal Language Model
Architecture: Dense Transformer, 0.6B parameter class
Fine-tuned from: Qwen/Qwen3-0.6B
License: Apache 2.0

Capabilities

Reasoning

Supertron2.1-0.6B is designed for clear, structured reasoning. It can break down questions into useful steps, compare options, explain tradeoffs, and provide concise conclusions when asked.

Math

The model can assist with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for learning, practice, and lightweight problem solving.

Coding

Supertron2.1-0.6B can write, debug, and explain code across common programming languages including Python, JavaScript, TypeScript, C++, Java, Rust, and shell scripting. It can help with implementation details, algorithmic reasoning, refactoring suggestions, and code explanations.

Science & General Knowledge

The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for short research assistance, study support, summaries, and clear explanations of technical ideas.

Instruction Following

Supertron2.1-0.6B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, JSON-like structures, code blocks, and longer explanations.

Get Started

Install the required packages:

pip install -U transformers torch accelerate

Load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Surpem/Supertron2.1-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Generate a response:

messages = [
    {"role": "user", "content": "Explain the difference between LoRA and full fine-tuning."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.8,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Recommended Generation Settings

For coding, math, and deterministic answers:

generation_config = {
    "max_new_tokens": 512,
    "do_sample": False,
}

For general chat and writing:

generation_config = {
    "max_new_tokens": 768,
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 20,
    "do_sample": True,
}

Hardware Requirements

Precision	Min VRAM	Recommended
bfloat16 / float16	2 GB	4 GB+
8-bit quantized	1.5 GB	3 GB+
4-bit quantized	1 GB	2 GB+

For 4-bit quantized inference:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "Surpem/Supertron2.1-0.6B"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Local Inference

The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:

Surpem/Supertron2.1-0.6B-GGUF

Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.

Intended Use

Supertron2.1-0.6B is intended for:

lightweight assistant experiments
local coding help
math practice and explanations
general question answering
summarization and rewriting
prototype agent workflows
educational and research use

Limitations

The model may hallucinate facts or produce outdated information.
Math and code answers can be incorrect and should be verified.
Complex reasoning tasks may exceed the capability of a 0.6B parameter model.
The model may produce repetitive or low-quality text with poor sampling settings.
It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.

Citation

@misc{surpem2026supertron21_06b,
      title={Supertron2.1-0.6B -- Efficient Instruction-Tuned Language Model},
      author={Surpem},
      year={2026},
      url={https://huggingface.co/Surpem/Supertron2.1-0.6B},
}