Supertron-2.1-8B-A1B: An Efficient Generalist Instruction-Tuned Language Model

Model Description

Supertron-2.1-8B-A1B is an instruction-tuned language model built on top of LiquidAI/LFM2.5-8B-A1B. It is designed as an efficient generalist assistant model for reasoning, coding, math, general knowledge, writing, summarization, and natural conversation.

The model keeps compatibility with standard transformers workflows while using the LiquidAI base model format. Supertron-2.1-8B-A1B is intended for users who want a capable assistant-style model with strong everyday usefulness across technical and general tasks.

  • Developed by: Surpem
  • Model type: Causal Language Model
  • Architecture: LiquidAI LFM2.5, 8B total parameter class with A1B active parameter behavior
  • Fine-tuned from: LiquidAI/LFM2.5-8B-A1B
  • License: Apache 2.0

Capabilities

Reasoning

Supertron-2.1-8B-A1B is tuned for clear assistant-style reasoning. It can explain decisions, compare options, break down multi-step questions, and produce structured answers when a task benefits from organization.

Math

The model can help with arithmetic, algebra, word problems, step-by-step explanations, and checking calculations. It is useful for practice, tutoring-style explanations, and lightweight quantitative reasoning.

Coding

Supertron-2.1-8B-A1B can write, debug, refactor, and explain code across common languages including Python, JavaScript, TypeScript, C++, Java, Rust, SQL, and shell scripting. It can assist with algorithms, implementation details, code review, and practical development questions.

Science & General Knowledge

The model can explain concepts across STEM, technology, history, business, and general knowledge domains. It is suitable for research assistance, summaries, educational explanations, and technical writing support.

Instruction Following

Supertron-2.1-8B-A1B follows direct natural language instructions and can adapt to requested formats such as concise answers, bullet lists, tables, code blocks, JSON-like structures, and longer explanatory responses.


Get Started

Install the required packages:

pip install -U transformers torch accelerate

Load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Surpem/Supertron-2.1-8B-A1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

Generate a response:

messages = [
    {"role": "user", "content": "Write a Python function that checks whether a number is prime."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Recommended Generation Settings

For coding, math, and deterministic answers:

generation_config = {
    "max_new_tokens": 512,
    "do_sample": False,
}

For general chat and writing:

generation_config = {
    "max_new_tokens": 768,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 40,
    "do_sample": True,
}

For longer explanations:

generation_config = {
    "max_new_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
    "do_sample": True,
}

Hardware Requirements

Precision Min VRAM Recommended
bfloat16 / float16 18 GB 24 GB+
8-bit quantized 10 GB 12 GB+
4-bit quantized 6 GB 8 GB+

For 4-bit quantized inference:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "Surpem/Supertron-2.1-8B-A1B"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

Local Inference

The official checkpoint in this repository is the Transformers version. A separate GGUF repository is available for llama.cpp, Ollama, LM Studio, and other local inference runtimes:

Use this repository when you want the original PyTorch/Transformers model. Use the GGUF repository when you want quantized local inference.


Intended Use

Supertron-2.1-8B-A1B is intended for:

  • general assistant workflows
  • coding help and code explanation
  • math practice and structured problem solving
  • general question answering
  • summarization and rewriting
  • technical explanation and research support
  • prototype agent workflows
  • educational and research use

Limitations

  • The model may hallucinate facts or produce outdated information.
  • Math and code answers can be incorrect and should be verified.
  • Complex reasoning tasks may require additional checking.
  • The model may produce repetitive or low-quality text with poor sampling settings.
  • It is not intended for legal, medical, financial, safety-critical, or identity-sensitive decisions without independent expert review.

Citation

@misc{surpem2026supertron21_8b_a1b,
      title={Supertron-2.1-8B-A1B -- Efficient Generalist Instruction-Tuned Language Model},
      author={Surpem},
      year={2026},
      url={https://huggingface.co/Surpem/Supertron-2.1-8B-A1B},
}
Downloads last month
17
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Surpem/Supertron-2.1-8B-A1B

Finetuned
(12)
this model
Quantizations
1 model

Collection including Surpem/Supertron-2.1-8B-A1B