Supertron2.1-0.6B: A Compact, Efficient Instruction-Tuned Language Model

Model Description

Supertron2.1-0.6B is an instruction-tuned language model built on top of Qwen3-0.6B. It is designed to be a small, efficient daily-driver model for reasoning, math, coding, general knowledge, writing, and assistant-style conversation while remaining lightweight enough to run on consumer hardware.

The model keeps the Qwen3 architecture, tokenizer, and chat format, which makes it easy to use with standard transformers workflows. Supertron2.1-0.6B is intended for users who want a compact generalist model that can answer questions, explain concepts, write code, solve structured problems, and follow natural language instructions.

  • Developed by: Surpem
  • Model type: Causal Language Model
  • Architecture: Dense Transformer, 0.6B parameter class
  • Fine-tuned from: Qwen/Qwen3-0.6B
  • License: Apache 2.0

Capabilities

Reasoning

Supertron2.1-0.6B is designed for clear, structured reasoning. It can break down questions into useful steps, compare options, explain tradeoffs, and provide concise conclusions when asked.

Math

The model can assist with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for learning, practice, and lightweight problem solving.

Coding

Supertron2.1-0.6B can write, debug, and explain code across common programming languages including Python, JavaScript, TypeScript, C++, Java, Rust, and shell scripting. It can help with implementation details, algorithmic reasoning, refactoring suggestions, and code explanations.

Science & General Knowledge

The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for short research assistance, study support, summaries, and clear explanations of technical ideas.

Instruction Following

Supertron2.1-0.6B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, JSON-like structures, code blocks, and longer explanations.


Get Started

Install the required packages:

pip install -U transformers torch accelerate

Load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Surpem/Supertron2.1-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Generate a response:

messages = [
    {"role": "user", "content": "Explain the difference between LoRA and full fine-tuning."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.8,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Recommended Generation Settings

For coding, math, and deterministic answers:

generation_config = {
    "max_new_tokens": 512,
    "do_sample": False,
}

For general chat and writing:

generation_config = {
    "max_new_tokens": 768,
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 20,
    "do_sample": True,
}

Hardware Requirements

Precision Min VRAM Recommended
bfloat16 / float16 2 GB 4 GB+
8-bit quantized 1.5 GB 3 GB+
4-bit quantized 1 GB 2 GB+

For 4-bit quantized inference:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "Surpem/Supertron2.1-0.6B"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Local Inference

The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:

Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.


Intended Use

Supertron2.1-0.6B is intended for:

  • lightweight assistant experiments
  • local coding help
  • math practice and explanations
  • general question answering
  • summarization and rewriting
  • prototype agent workflows
  • educational and research use

Limitations

  • The model may hallucinate facts or produce outdated information.
  • Math and code answers can be incorrect and should be verified.
  • Complex reasoning tasks may exceed the capability of a 0.6B parameter model.
  • The model may produce repetitive or low-quality text with poor sampling settings.
  • It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.

Citation

@misc{surpem2026supertron21_06b,
      title={Supertron2.1-0.6B -- Efficient Instruction-Tuned Language Model},
      author={Surpem},
      year={2026},
      url={https://huggingface.co/Surpem/Supertron2.1-0.6B},
}
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Surpem/Supertron2.1-0.6B

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(952)
this model
Quantizations
1 model

Collection including Surpem/Supertron2.1-0.6B