Instructions to use Surpem/Supertron-2.1-8B-A1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Surpem/Supertron-2.1-8B-A1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Surpem/Supertron-2.1-8B-A1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron-2.1-8B-A1B") model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron-2.1-8B-A1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Surpem/Supertron-2.1-8B-A1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Surpem/Supertron-2.1-8B-A1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron-2.1-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Surpem/Supertron-2.1-8B-A1B
- SGLang
How to use Surpem/Supertron-2.1-8B-A1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Surpem/Supertron-2.1-8B-A1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron-2.1-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Surpem/Supertron-2.1-8B-A1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron-2.1-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Surpem/Supertron-2.1-8B-A1B with Docker Model Runner:
docker model run hf.co/Surpem/Supertron-2.1-8B-A1B
Supertron-2.1-8B-A1B: An Efficient Generalist Instruction-Tuned Language Model
Model Description
Supertron-2.1-8B-A1B is an instruction-tuned language model built on top of LiquidAI/LFM2.5-8B-A1B. It is designed as an efficient generalist assistant model for reasoning, coding, math, general knowledge, writing, summarization, and natural conversation.
The model keeps compatibility with standard transformers workflows while using the LiquidAI base model format. Supertron-2.1-8B-A1B is intended for users who want a capable assistant-style model with strong everyday usefulness across technical and general tasks.
- Developed by: Surpem
- Model type: Causal Language Model
- Architecture: LiquidAI LFM2.5, 8B total parameter class with A1B active parameter behavior
- Fine-tuned from: LiquidAI/LFM2.5-8B-A1B
- License: Apache 2.0
Capabilities
Reasoning
Supertron-2.1-8B-A1B is tuned for clear assistant-style reasoning. It can explain decisions, compare options, break down multi-step questions, and produce structured answers when a task benefits from organization.
Math
The model can help with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for practice, tutoring-style explanations, and lightweight quantitative reasoning.
Coding
Supertron-2.1-8B-A1B can write, debug, refactor, and explain code across common languages including Python, JavaScript, TypeScript, C++, Java, Rust, SQL, and shell scripting. It can assist with algorithms, implementation details, code review, and practical development questions.
Science & General Knowledge
The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for research assistance, summaries, educational explanations, and technical writing support.
Instruction Following
Supertron-2.1-8B-A1B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, code blocks, JSON-like structures, and longer explanatory responses.
Get Started
Install the required packages:
pip install -U transformers torch accelerate
Load the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Surpem/Supertron-2.1-8B-A1B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
Generate a response:
messages = [
{"role": "user", "content": "Write a Python function that checks whether a number is prime."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Recommended Generation Settings
For coding, math, and deterministic answers:
generation_config = {
"max_new_tokens": 512,
"do_sample": False,
}
For general chat and writing:
generation_config = {
"max_new_tokens": 768,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"do_sample": True,
}
For longer explanations:
generation_config = {
"max_new_tokens": 1024,
"temperature": 0.6,
"top_p": 0.9,
"do_sample": True,
}
Hardware Requirements
| Precision | Min VRAM | Recommended |
|---|---|---|
| bfloat16 / float16 | 18 GB | 24 GB+ |
| 8-bit quantized | 10 GB | 12 GB+ |
| 4-bit quantized | 6 GB | 8 GB+ |
For 4-bit quantized inference:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "Surpem/Supertron-2.1-8B-A1B"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
Local Inference
The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:
Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.
Intended Use
Supertron-2.1-8B-A1B is intended for:
- general assistant workflows
- coding help and code explanation
- math practice and structured problem solving
- general question answering
- summarization and rewriting
- technical explanation and research support
- prototype agent workflows
- educational and research use
Limitations
- The model may hallucinate facts or produce outdated information.
- Math and code answers can be incorrect and should be verified.
- Complex reasoning tasks may require additional checking.
- The model may produce repetitive or low-quality text with poor sampling settings.
- It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.
Citation
@misc{surpem2026supertron21_8b_a1b,
title={Supertron-2.1-8B-A1B -- Efficient Generalist Instruction-Tuned Language Model},
author={Surpem},
year={2026},
url={https://huggingface.co/Surpem/Supertron-2.1-8B-A1B},
}
- Downloads last month
- 17