Instructions to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AtomixLabs/AtomixS2-5M-v1.0-GGUF", filename="aurelius-f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
Use Docker
docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AtomixLabs/AtomixS2-5M-v1.0-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AtomixLabs/AtomixS2-5M-v1.0-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
- Ollama
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Ollama:
ollama run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
- Unsloth Studio
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AtomixLabs/AtomixS2-5M-v1.0-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Docker Model Runner:
docker model run hf.co/AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
- Lemonade
How to use AtomixLabs/AtomixS2-5M-v1.0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AtomixLabs/AtomixS2-5M-v1.0-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.AtomixS2-5M-v1.0-GGUF-Q4_K_M
List all available models
lemonade list
AtomixS2-5M-v1.0-GGUF
A Quick Note on the GGUF Files
Since this is the GGUF version of the model, we wanted to share a quick heads-up about which files you should actually download.
We highly recommend sticking to the unquantized F32 or F16 files.
At 5.98 million parameters, there is simply no margin for error. While massive models can easily handle being squeezed down to 4-bit or 3-bit sizes, this tiny network gets scrambled very easily.
If you try running the smaller files (like Q8_0 or Q4_K_M), the rounding errors completely overwhelm the layers. The model will lose its grip on basic English grammar, spell words wrong, and get stuck in loops. We uploaded those smaller, heavily quantized files purely as experimental artifacts for you to play around with, but if you want to see the model actually follow grammar and reasoning, stick to the F16 or F32 files.
At AtomixLabs, our research often focuses on the physical constraints of neural architectures. With AtomixS2-5M-v1.0, we wanted to explore the absolute floor of language acquisition: What happens when you restrict a model to just under 6 million parameters?
At this extreme scale, a model does not have the capacity to memorize the internet, store vast encyclopedias of trivia, or comprehend deep physical-world mechanics. Instead, it becomes a pure engine of syntax and structure. This model is our attempt to build a highly active, fluent, and structurally sound micro-model that runs easily on almost any hardware—from older consumer GPUs to standard laptop processors. It is an exploration of parameter density and careful data curation over sheer scale.
Architectural Constraints and Vocabulary Design
AtomixS2-5M is a standard decoder-only transformer, built with a very tight structural configuration:
- Parameter Count: 5.98 Million
- Layers: 4
- Attention Heads: 8
- Context Window: 512 Tokens
When working with a parameter budget this constrained, traditional methods of tokenization become a major liability. Standard vocabularies often leak capacity by assigning precious embedding parameters to unused symbols, broken formatting fragments, or redundant uppercase and lowercase variations of the exact same word.
To resolve this, we engineered a custom, highly dense 3,584-token vocabulary. We utilized custom token mapping to ensure the model doesn't waste parameters relearning basic capitalization patterns or storing empty placeholder slots. Every single parameter in the embedding matrix is designed to be an active, high-yield node. By structuring the vocabulary this way, the model is able to direct more of its limited capacity toward understanding grammatical rules and logical transitions.
The Training Mixture: Syntax over Trivia
To get a model this small to output cohesive text, the training diet has to be incredibly deliberate. If you feed a micro-model nothing but raw, unfiltered web data, it tends to become noisy and chaotic. We trained AtomixS2 over a heavily curated, multi-domain mixture designed specifically to teach structure, narrative flow, and procedural formatting rather than just isolated facts.
Our final pre-training corpus was constructed using carefully balanced subsets from the following open-source datasets:
openbmb/Ultra-FineWeb-L3(~54%)HuggingFaceTB/smollm-corpus(~22%)Aarushhh/finemath-refined(~14%)openbmb/UltraData-Math(~10%)
By blending foundational web text with conversational narratives and a heavy dose of structured mathematics, we provided the model with a strong cognitive anchor. The math data, for instance, isn't there to teach a 6M parameter model advanced calculus. Instead, it forces the model's limited attention heads to learn how to track states, format markdown lists, close LaTeX brackets, and follow a strict sequential chain of thought, which cleanly transfers over to its general English syntax.
Recommended Generation Settings
Because AtomixS2-5M has a narrow parameter space, its probability distributions can behave differently than those of massive models. To get the cleanest, most coherent text generation, we strongly recommend the following sampling parameters:
temperature: 0.5— A slightly lower temperature helps keep the model grounded. It prevents the network from wandering into the noisy "tail" of its vocabulary.min_p: 0.1— This dynamically truncates the lowest-probability tokens. It acts as a great filter against sudden hallucinations or spelling breaks.repetition_penalty: 1.05to1.1— Small models can occasionally get caught in structural formatting loops (like generating continuous markdown tables). A light repetition penalty gently nudges the model to keep moving forward without destroying its natural vocabulary flow.
Benchmark Performance
We evaluated AtomixS2-5M-v1.0 using the standard EleutherAI evaluation harness. The scores reported below are length-normalized accuracies (acc_norm), which provide the fairest assessment of a model's true reasoning ability by neutralizing length bias.
| Benchmark | Score | What this means at the 5M Scale |
|---|---|---|
| HellaSwag | 28.27% |
Tests common-sense sentence completion. The model performs exceptionally well here because its grammar and syntactic transitions are highly stable. |
| ArithMark 2.0 | 27.92% |
Tests basic integer sequence prediction. The inclusion of procedural math data allows the model to handle numeric formats and basic arithmetic reliably. |
| ARC-Easy | 32.79% |
Tests basic grade-school science logic. |
| ARC-Challenge | 21.08% |
Tests advanced reasoning. Expectedly difficult for micro-models. |
| PIQA | 53.70% |
Tests physical real-world trivia (e.g., how water reacts with a sponge). Because a 5.98M model lacks the capacity to store vast amounts of real-world physical trivia, this score reflects our natural trade-off of dedicating limited parameters to syntax rather than database memorization. |
What It's Like to Use
Interacting with AtomixS2-5M is a unique experience. It leans heavily into a formal, academic, and highly structured tone. It spells words with high accuracy and reliably outputs clean punctuation, markdown steps, and logical clauses.
However, users should keep in mind that its factual grounding is incredibly thin. It will confidently hallucinate an entirely fabricated historical event or scientific theory, but it will do so using flawless grammar and excellent paragraph structure. It is a model that has mastered the shape of human language, but relies entirely on you to provide the factual context. It serves as an excellent, lightweight foundation for local research, educational testing, or syntax-parsing experiments.
Safety Disclaimer, Ethical Considerations & Limitations
Limitations and Operational Warnings
AtomixS2-5M-v1.0 is an experimental, research-oriented micro-model. Due to its extremely limited parameter scale, it fundamentally lacks the complex contextual grounding, broad world knowledge, and safety-alignment mechanisms integrated into larger, commercial language models. Please read the following operational guidelines and limitations carefully before deploying or interacting with this model.
1. Severe Factual Hallucination This model is highly susceptible to generating plausible-sounding but entirely fabricated information. It has learned the grammatical rules of how facts are stated, but it does not have the parameter capacity to store the facts themselves.
- No Critical Use: Under no circumstances should this model be used to generate or verify medical, legal, financial, or safety-critical information.
- Verification Required: Any factual claims, citations, or mathematical calculations produced by the model must be independently verified by a human expert.
2. Absence of Safety Alignment (RLHF) AtomixS2-5M-v1.0 is a foundational pre-trained model. It has not undergone Reinforcement Learning from Human Feedback (RLHF), Constitutional AI training, or any other adversarial safety-tuning process.
- Unfiltered Outputs: The model may generate outputs that are biased, offensive, explicit, or otherwise inappropriate. It will generally comply with malicious or unethical prompts without triggering any refusal mechanisms.
- Inherited Biases: The model was trained on subsets of public web data and synthetic datasets. It inevitably reflects and may amplify the historical, societal, and cultural biases present in that training data.
3. Contextual and Logical Drift While the model demonstrates strong syntactic stability over short generations, its 512-token context window and small hidden dimensions limit its long-range focus. On longer generations, the model may drift off-topic, repeat structural formats, or devolve into logical contradictions. It is best utilized for short-form generation, syntax parsing, and foundational research rather than extended multi-turn conversations.
4. Not for Public or Unsupervised Deployment Because the model lacks content filters and safety guardrails, it is highly unsuitable for unsupervised deployment in public-facing applications, customer service bots, or environments where it might interact with minors. Any integration into a software pipeline should include robust external filtering and safety moderation layers.
5. Liability and "As-Is" Provision AtomixS2-5M-v1.0 is provided by AtomixLabs strictly "as is" for the purposes of academic research, efficiency testing, and local machine learning education. AtomixLabs makes no warranties regarding the safety, accuracy, or reliability of the model. AtomixLabs and its contributors take no responsibility and bear no liability for the outputs generated by the model, or for any downstream applications, damages, or consequences resulting from its use. Users assume full responsibility for how they deploy, fine-tune, and interact with the model weights.
Acknowledgements & Licensing
This model was trained on the following datasets:
- Ultra-FineWeb-L3 (Apache 2.0)
- SmolLM-Corpus (ODC-By)
- FineMath-Refined (Based on FineMath, ODC-By)
- UltraData-Math (Apache 2.0)
- Downloads last month
- 198
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
16-bit
32-bit
