Instructions to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AtomixLabs/AtomixS2-5M-v1.0-GGUF",
	filename="aurelius-f16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Use Docker

docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AtomixLabs/AtomixS2-5M-v1.0-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AtomixLabs/AtomixS2-5M-v1.0-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Ollama
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Ollama:
```
ollama run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
```

Unsloth Studio

How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Docker Model Runner:
```
docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
```

Lemonade

How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.AtomixS2-5M-v1.0-GGUF-Q4_K_M

List all available models

lemonade list

AtomixS2-5M-v1.0-GGUF

A Quick Note on the GGUF Files

Since this is the GGUF version of the model, we wanted to share a quick heads-up about which files you should actually download.

We highly recommend sticking to the unquantized F32 or F16 files.

At 5.98 million parameters, there is simply no margin for error. While massive models can easily handle being squeezed down to 4-bit or 3-bit sizes, this tiny network gets scrambled very easily.

If you try running the smaller files (like Q8_0 or Q4_K_M), the rounding errors completely overwhelm the layers. The model will lose its grip on basic English grammar, spell words wrong, and get stuck in loops. We uploaded those smaller, heavily quantized files purely as experimental artifacts for you to play around with, but if you want to see the model actually follow grammar and reasoning, stick to the F16 or F32 files.

At AtomixLabs, our research often focuses on the physical constraints of neural architectures. With AtomixS2-5M-v1.0, we wanted to explore the absolute floor of language acquisition: What happens when you restrict a model to just under 6 million parameters?

At this extreme scale, a model does not have the capacity to memorize the internet, store vast encyclopedias of trivia, or comprehend deep physical-world mechanics. Instead, it becomes a pure engine of syntax and structure. This model is our attempt to build a highly active, fluent, and structurally sound micro-model that runs easily on almost any hardware—from older consumer GPUs to standard laptop processors. It is an exploration of parameter density and careful data curation over sheer scale.

Architectural Constraints and Vocabulary Design

AtomixS2-5M is a standard decoder-only transformer, built with a very tight structural configuration:

Parameter Count: 5.98 Million
Layers: 4
Attention Heads: 8
Context Window: 512 Tokens

When working with a parameter budget this constrained, traditional methods of tokenization become a major liability. Standard vocabularies often leak capacity by assigning precious embedding parameters to unused symbols, broken formatting fragments, or redundant uppercase and lowercase variations of the exact same word.

To resolve this, we engineered a custom, highly dense 3,584-token vocabulary. We utilized custom token mapping to ensure the model doesn't waste parameters relearning basic capitalization patterns or storing empty placeholder slots. Every single parameter in the embedding matrix is designed to be an active, high-yield node. By structuring the vocabulary this way, the model is able to direct more of its limited capacity toward understanding grammatical rules and logical transitions.

The Training Mixture: Syntax over Trivia

To get a model this small to output cohesive text, the training diet has to be incredibly deliberate. If you feed a micro-model nothing but raw, unfiltered web data, it tends to become noisy and chaotic. We trained AtomixS2 over a heavily curated, multi-domain mixture designed specifically to teach structure, narrative flow, and procedural formatting rather than just isolated facts.

Our final pre-training corpus was constructed using carefully balanced subsets from the following open-source datasets:

openbmb/Ultra-FineWeb-L3 (~54%)
HuggingFaceTB/smollm-corpus (~22%)
Aarushhh/finemath-refined (~14%)
openbmb/UltraData-Math (~10%)

By blending foundational web text with conversational narratives and a heavy dose of structured mathematics, we provided the model with a strong cognitive anchor. The math data, for instance, isn't there to teach a 6M parameter model advanced calculus. Instead, it forces the model's limited attention heads to learn how to track states, format markdown lists, close LaTeX brackets, and follow a strict sequential chain of thought, which cleanly transfers over to its general English syntax.

Recommended Generation Settings

Because AtomixS2-5M has a narrow parameter space, its probability distributions can behave differently than those of massive models. To get the cleanest, most coherent text generation, we strongly recommend the following sampling parameters:

temperature: 0.5 — A slightly lower temperature helps keep the model grounded. It prevents the network from wandering into the noisy "tail" of its vocabulary.
min_p: 0.1 — This dynamically truncates the lowest-probability tokens. It acts as a great filter against sudden hallucinations or spelling breaks.
repetition_penalty: 1.05 to 1.1 — Small models can occasionally get caught in structural formatting loops (like generating continuous markdown tables). A light repetition penalty gently nudges the model to keep moving forward without destroying its natural vocabulary flow.

Benchmark Performance

We evaluated AtomixS2-5M-v1.0 using the standard EleutherAI evaluation harness. The scores reported below are length-normalized accuracies (acc_norm), which provide the fairest assessment of a model's true reasoning ability by neutralizing length bias.

Benchmark	Score	What this means at the 5M Scale
HellaSwag	`28.27%`	Tests common-sense sentence completion. The model performs exceptionally well here because its grammar and syntactic transitions are highly stable.
ArithMark 2.0	`27.92%`	Tests basic integer sequence prediction. The inclusion of procedural math data allows the model to handle numeric formats and basic arithmetic reliably.
ARC-Easy	`32.79%`	Tests basic grade-school science logic.
ARC-Challenge	`21.08%`	Tests advanced reasoning. Expectedly difficult for micro-models.
PIQA	`53.70%`	Tests physical real-world trivia (e.g., how water reacts with a sponge). Because a 5.98M model lacks the capacity to store vast amounts of real-world physical trivia, this score reflects our natural trade-off of dedicating limited parameters to syntax rather than database memorization.

What It's Like to Use

Interacting with AtomixS2-5M is a unique experience. It leans heavily into a formal, academic, and highly structured tone. It spells words with high accuracy and reliably outputs clean punctuation, markdown steps, and logical clauses.

However, users should keep in mind that its factual grounding is incredibly thin. It will confidently hallucinate an entirely fabricated historical event or scientific theory, but it will do so using flawless grammar and excellent paragraph structure. It is a model that has mastered the shape of human language, but relies entirely on you to provide the factual context. It serves as an excellent, lightweight foundation for local research, educational testing, or syntax-parsing experiments.

Safety Disclaimer, Ethical Considerations & Limitations

Limitations and Operational Warnings

AtomixS2-5M-v1.0 is an experimental, research-oriented micro-model. Due to its extremely limited parameter scale, it fundamentally lacks the complex contextual grounding, broad world knowledge, and safety-alignment mechanisms integrated into larger, commercial language models. Please read the following operational guidelines and limitations carefully before deploying or interacting with this model.

1. Severe Factual Hallucination This model is highly susceptible to generating plausible-sounding but entirely fabricated information. It has learned the grammatical rules of how facts are stated, but it does not have the parameter capacity to store the facts themselves.

No Critical Use: Under no circumstances should this model be used to generate or verify medical, legal, financial, or safety-critical information.
Verification Required: Any factual claims, citations, or mathematical calculations produced by the model must be independently verified by a human expert.

2. Absence of Safety Alignment (RLHF) AtomixS2-5M-v1.0 is a foundational pre-trained model. It has not undergone Reinforcement Learning from Human Feedback (RLHF), Constitutional AI training, or any other adversarial safety-tuning process.

Unfiltered Outputs: The model may generate outputs that are biased, offensive, explicit, or otherwise inappropriate. It will generally comply with malicious or unethical prompts without triggering any refusal mechanisms.
Inherited Biases: The model was trained on subsets of public web data and synthetic datasets. It inevitably reflects and may amplify the historical, societal, and cultural biases present in that training data.

3. Contextual and Logical Drift While the model demonstrates strong syntactic stability over short generations, its 512-token context window and small hidden dimensions limit its long-range focus. On longer generations, the model may drift off-topic, repeat structural formats, or devolve into logical contradictions. It is best utilized for short-form generation, syntax parsing, and foundational research rather than extended multi-turn conversations.

4. Not for Public or Unsupervised Deployment Because the model lacks content filters and safety guardrails, it is highly unsuitable for unsupervised deployment in public-facing applications, customer service bots, or environments where it might interact with minors. Any integration into a software pipeline should include robust external filtering and safety moderation layers.

5. Liability and "As-Is" Provision AtomixS2-5M-v1.0 is provided by AtomixLabs strictly "as is" for the purposes of academic research, efficiency testing, and local machine learning education. AtomixLabs makes no warranties regarding the safety, accuracy, or reliability of the model. AtomixLabs and its contributors take no responsibility and bear no liability for the outputs generated by the model, or for any downstream applications, damages, or consequences resulting from its use. Users assume full responsibility for how they deploy, fine-tune, and interact with the model weights.

Acknowledgements & Licensing

This model was trained on the following datasets:

Ultra-FineWeb-L3 (Apache 2.0)
SmolLM-Corpus (ODC-By)
FineMath-Refined (Based on FineMath, ODC-By)
UltraData-Math (Apache 2.0)

Downloads last month: 198

GGUF

Model size

5.98M params

Architecture

llama

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

32-bit

Datasets used to train AtomixLabs/AtomixS2-5M-v1.0-GGUF

Collection including AtomixLabs/AtomixS2-5M-v1.0-GGUF

AtomixS2 Model Series

Collection

2 items • Updated 3 days ago