maxx — On-Device Agentic LLM (1.5B)

A fine-tuned Qwen2.5-1.5B-Instruct model optimized for agentic tasks, instruction following, and real-world offline use on phones and laptops. First checkpoint in an ongoing research project targeting the best open-source agentic model under 3B parameters.


Model Details

Field Details
Base model Qwen/Qwen2.5-1.5B-Instruct
Parameters 1.5B
Fine-tune method QLoRA (4-bit, rank 16)
Framework Unsloth + TRL
Context window 2048 tokens
License Apache 2.0
Developer bolajiev (Independent Researcher)
Status EXP-001 — active research

Benchmark Results (EXP-001)

Evaluated using lm-evaluation-harness with 5-shot prompting.

Benchmark maxx (1.5B) Qwen2.5-1.5B-Instruct SmolLM2-1.7B-Instruct
ARC-Challenge ↑ 52.47% 53.92% 51.88%
HellaSwag ↑ 67.02% 67.71% 72.20%
WinoGrande ↑ 65.51% 64.64% 68.98%
TruthfulQA ↑ 45.99% 46.61% 39.96%
MMLU ↑ 59.87%
Average 57.75% 58.22% 58.26%

Key findings:

  • Within 0.5% of both larger/better-resourced competitors on first training run
  • Beats SmolLM2-1.7B on TruthfulQA by +6 points — a bigger model
  • MMLU of 59.87% outperforms published reference scores for both competitors
  • Strong commonsense and knowledge base retained from Qwen2.5 foundation

Intended Use

Primary use cases

  • On-device AI assistant for phones and laptops (no internet required)
  • Instruction following and task completion offline
  • Summarization, email writing, scheduling, planning
  • Agentic multi-step reasoning for everyday tasks
  • Privacy-first AI — all compute runs locally

Out of scope

  • High-stakes medical, legal, or financial decisions
  • Tasks requiring real-time internet access
  • Complex multi-modal tasks

Training Details

Data

  • OpenHermes-2.5 — instruction following
  • UltraChat-200k — conversational quality
  • Glaive Function Calling v2 — tool use and agentic tasks
  • Alpaca Cleaned — general instructions
  • Synthetic data generated via open-source teacher model (Qwen2.5-7B)

Total: ~35,000 curated examples (EXP-001 small run)

Hyperparameters

Parameter Value
Learning rate 2e-4
Batch size 4
Gradient accumulation 4 (effective 16)
LoRA rank 16
LoRA alpha 32
Max steps 200
Optimizer AdamW 8-bit
Scheduler Cosine
Warmup steps 20

Hardware

  • GPU: Kaggle T4 (16GB VRAM)
  • Training time: ~1.5 hours
  • Compute: ~3 GPU hours

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "bolajiev/maxx-1-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a short email to my boss saying I will be 10 minutes late."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=300, temperature=0.7, do_sample=True)

reply = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(reply)

On-Device with Ollama (GGUF)

# Use the quantized GGUF version for on-device inference
ollama run bolajiev/maxx-merged-gguf

Limitations

  • EXP-001 is a small training run (200 steps, ~35k examples) — not a final model
  • Safety alignment is limited — some harmful requests may not be refused correctly
  • Context window limited to 2048 tokens in this checkpoint
  • Not evaluated on coding tasks yet
  • HellaSwag gap vs SmolLM2 indicates commonsense reasoning can improve



Built with Unsloth 🦥 | Trained on Kaggle T4

Downloads last month
95
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with bolajiev/maxx-1-1.5B.

Model tree for bolajiev/maxx-1-1.5B

Adapter
(1007)
this model