Axon 250M

A 250M parameter custom chat model by Axon Labs. Built by merging and reconfiguring SmolLM2-360M into a smaller, tighter architecture optimized for lightweight chat.

Note: This model is NOT fine-tuned. It is a custom architectural reconfiguration and merge โ€” the weights were restructured, not trained on new data. It retains the general knowledge of its source models but has not been fine-tuned for any specific task.

Model Details

  • Parameters: ~362M (F32) โ€” marketed as 250M class
  • Architecture: LlamaForCausalLM (custom reconfiguration)
  • Hidden size: 960
  • Layers: 32
  • Attention heads: 15
  • KV heads: 5 (GQA)
  • Intermediate size: 2560
  • Max context: 8192 tokens
  • Vocab size: 49,152
  • Activation: SiLU
  • Tokenizer: SmolLM2 tokenizer with ChatML formatting (<|im_start|> / <|im_end|>)
  • License: MIT

Key Differences from Source

Unlike the base SmolLM2-360M, Axon 250M was created through architectural merging and reconfiguration:

  • Restructured layer count and attention configuration
  • GQA with 5 KV heads for efficient inference
  • Custom head dimension of 64
  • RoPE with theta=100000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("axonlabsai/axon-250m", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("axonlabsai/axon-250m")

messages = [{"role": "user", "content": "Hey, what's up?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

  • NOT fine-tuned โ€” no task-specific training was performed
  • Very small model with limited reasoning and factual knowledge
  • Prone to hallucination and incoherent outputs on complex prompts
  • Best suited for simple chat and experimentation, not production use
  • The "250M" branding reflects its model class, actual parameter count is ~362M

About Axon Labs

Axon Labs builds AI models and tools. This is our tiny model โ€” small enough to run anywhere, dumb enough to be funny.

Downloads last month
57
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for axonlabsai/axon-250m

Quantized
(34)
this model

Collection including axonlabsai/axon-250m