Instructions to use Surpem/Supertron2.1-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Surpem/Supertron2.1-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Surpem/Supertron2.1-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron2.1-0.6B") model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron2.1-0.6B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Surpem/Supertron2.1-0.6B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Surpem/Supertron2.1-0.6B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron2.1-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Surpem/Supertron2.1-0.6B
- SGLang
How to use Surpem/Supertron2.1-0.6B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Surpem/Supertron2.1-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron2.1-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Surpem/Supertron2.1-0.6B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron2.1-0.6B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Surpem/Supertron2.1-0.6B with Docker Model Runner:
docker model run hf.co/Surpem/Supertron2.1-0.6B
Supertron2.1-0.6B: A Compact, Efficient Instruction-Tuned Language Model
Model Description
Supertron2.1-0.6B is an instruction-tuned language model built on top of Qwen3-0.6B. It is designed to be a small, efficient daily-driver model for reasoning, math, coding, general knowledge, writing, and assistant-style conversation while remaining lightweight enough to run on consumer hardware.
The model keeps the Qwen3 architecture, tokenizer, and chat format, which makes it easy to use with standard transformers workflows. Supertron2.1-0.6B is intended for users who want a compact generalist model that can answer questions, explain concepts, write code, solve structured problems, and follow natural language instructions.
- Developed by: Surpem
- Model type: Causal Language Model
- Architecture: Dense Transformer, 0.6B parameter class
- Fine-tuned from: Qwen/Qwen3-0.6B
- License: Apache 2.0
Capabilities
Reasoning
Supertron2.1-0.6B is designed for clear, structured reasoning. It can break down questions into useful steps, compare options, explain tradeoffs, and provide concise conclusions when asked.
Math
The model can assist with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for learning, practice, and lightweight problem solving.
Coding
Supertron2.1-0.6B can write, debug, and explain code across common programming languages including Python, JavaScript, TypeScript, C++, Java, Rust, and shell scripting. It can help with implementation details, algorithmic reasoning, refactoring suggestions, and code explanations.
Science & General Knowledge
The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for short research assistance, study support, summaries, and clear explanations of technical ideas.
Instruction Following
Supertron2.1-0.6B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, JSON-like structures, code blocks, and longer explanations.
Get Started
Install the required packages:
pip install -U transformers torch accelerate
Load the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Surpem/Supertron2.1-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
Generate a response:
messages = [
{"role": "user", "content": "Explain the difference between LoRA and full fine-tuning."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.8,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Recommended Generation Settings
For coding, math, and deterministic answers:
generation_config = {
"max_new_tokens": 512,
"do_sample": False,
}
For general chat and writing:
generation_config = {
"max_new_tokens": 768,
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"do_sample": True,
}
Hardware Requirements
| Precision | Min VRAM | Recommended |
|---|---|---|
| bfloat16 / float16 | 2 GB | 4 GB+ |
| 8-bit quantized | 1.5 GB | 3 GB+ |
| 4-bit quantized | 1 GB | 2 GB+ |
For 4-bit quantized inference:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "Surpem/Supertron2.1-0.6B"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
)
Local Inference
The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:
Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.
Intended Use
Supertron2.1-0.6B is intended for:
- lightweight assistant experiments
- local coding help
- math practice and explanations
- general question answering
- summarization and rewriting
- prototype agent workflows
- educational and research use
Limitations
- The model may hallucinate facts or produce outdated information.
- Math and code answers can be incorrect and should be verified.
- Complex reasoning tasks may exceed the capability of a 0.6B parameter model.
- The model may produce repetitive or low-quality text with poor sampling settings.
- It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.
Citation
@misc{surpem2026supertron21_06b,
title={Supertron2.1-0.6B -- Efficient Instruction-Tuned Language Model},
author={Surpem},
year={2026},
url={https://huggingface.co/Surpem/Supertron2.1-0.6B},
}
- Downloads last month
- -