Instructions to use Surpem/Supertron-2.1-8B-A1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Surpem/Supertron-2.1-8B-A1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Surpem/Supertron-2.1-8B-A1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron-2.1-8B-A1B")
model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron-2.1-8B-A1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Surpem/Supertron-2.1-8B-A1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Surpem/Supertron-2.1-8B-A1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron-2.1-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Surpem/Supertron-2.1-8B-A1B

SGLang

How to use Surpem/Supertron-2.1-8B-A1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Surpem/Supertron-2.1-8B-A1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron-2.1-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Surpem/Supertron-2.1-8B-A1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron-2.1-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Surpem/Supertron-2.1-8B-A1B with Docker Model Runner:
```
docker model run hf.co/Surpem/Supertron-2.1-8B-A1B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Supertron-2.1-8B-A1B: An Efficient Generalist Instruction-Tuned Language Model

Model Description

Supertron-2.1-8B-A1B is an instruction-tuned language model built on top of LiquidAI/LFM2.5-8B-A1B. It is designed as an efficient generalist assistant model for reasoning, coding, math, general knowledge, writing, summarization, and natural conversation.

The model keeps compatibility with standard transformers workflows while using the LiquidAI base model format. Supertron-2.1-8B-A1B is intended for users who want a capable assistant-style model with strong everyday usefulness across technical and general tasks.

Developed by: Surpem
Model type: Causal Language Model
Architecture: LiquidAI LFM2.5, 8B total parameter class with A1B active parameter behavior
Fine-tuned from: LiquidAI/LFM2.5-8B-A1B
License: Apache 2.0

Capabilities

Reasoning

Supertron-2.1-8B-A1B is tuned for clear assistant-style reasoning. It can explain decisions, compare options, break down multi-step questions, and produce structured answers when a task benefits from organization.

Math

The model can help with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for practice, tutoring-style explanations, and lightweight quantitative reasoning.

Coding

Supertron-2.1-8B-A1B can write, debug, refactor, and explain code across common languages including Python, JavaScript, TypeScript, C++, Java, Rust, SQL, and shell scripting. It can assist with algorithms, implementation details, code review, and practical development questions.

Science & General Knowledge

The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for research assistance, summaries, educational explanations, and technical writing support.

Instruction Following

Supertron-2.1-8B-A1B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, code blocks, JSON-like structures, and longer explanatory responses.

Get Started

Install the required packages:

pip install -U transformers torch accelerate

Load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Surpem/Supertron-2.1-8B-A1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

Generate a response:

messages = [
    {"role": "user", "content": "Write a Python function that checks whether a number is prime."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Recommended Generation Settings

For coding, math, and deterministic answers:

generation_config = {
    "max_new_tokens": 512,
    "do_sample": False,
}

For general chat and writing:

generation_config = {
    "max_new_tokens": 768,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "do_sample": True,
}

For longer explanations:

generation_config = {
    "max_new_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
    "do_sample": True,
}

Hardware Requirements

Precision	Min VRAM	Recommended
bfloat16 / float16	18 GB	24 GB+
8-bit quantized	10 GB	12 GB+
4-bit quantized	6 GB	8 GB+

For 4-bit quantized inference:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "Surpem/Supertron-2.1-8B-A1B"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

Local Inference

The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:

Surpem/Supertron-2.1-8B-A1B-GGUF

Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.

Intended Use

Supertron-2.1-8B-A1B is intended for:

general assistant workflows
coding help and code explanation
math practice and structured problem solving
general question answering
summarization and rewriting
technical explanation and research support
prototype agent workflows
educational and research use

Limitations

The model may hallucinate facts or produce outdated information.
Math and code answers can be incorrect and should be verified.
Complex reasoning tasks may require additional checking.
The model may produce repetitive or low-quality text with poor sampling settings.
It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.

Citation

@misc{surpem2026supertron21_8b_a1b,
      title={Supertron-2.1-8B-A1B -- Efficient Generalist Instruction-Tuned Language Model},
      author={Surpem},
      year={2026},
      url={https://huggingface.co/Surpem/Supertron-2.1-8B-A1B},
}