maxx — On-Device Agentic LLM (1.5B)

A fine-tuned Qwen2.5-1.5B-Instruct model optimized for agentic tasks, instruction following, and real-world offline use on phones and laptops. First checkpoint in an ongoing research project targeting the best open-source agentic model under 3B parameters.

Model Details

Field	Details
Base model	Qwen/Qwen2.5-1.5B-Instruct
Parameters	1.5B
Fine-tune method	QLoRA (4-bit, rank 16)
Framework	Unsloth + TRL
Context window	2048 tokens
License	Apache 2.0
Developer	bolajiev (Independent Researcher)
Status	EXP-001 — active research

Benchmark Results (EXP-001)

Evaluated using lm-evaluation-harness with 5-shot prompting.

Benchmark	maxx (1.5B)	Qwen2.5-1.5B-Instruct	SmolLM2-1.7B-Instruct
ARC-Challenge ↑	52.47%	53.92%	51.88%
HellaSwag ↑	67.02%	67.71%	72.20%
WinoGrande ↑	65.51%	64.64%	68.98%
TruthfulQA ↑	45.99%	46.61%	39.96%
MMLU ↑	59.87%	—	—
Average	57.75%	58.22%	58.26%

Key findings:

Within 0.5% of both larger/better-resourced competitors on first training run
Beats SmolLM2-1.7B on TruthfulQA by +6 points — a bigger model
MMLU of 59.87% outperforms published reference scores for both competitors
Strong commonsense and knowledge base retained from Qwen2.5 foundation

Intended Use

Primary use cases

On-device AI assistant for phones and laptops (no internet required)
Instruction following and task completion offline
Summarization, email writing, scheduling, planning
Agentic multi-step reasoning for everyday tasks
Privacy-first AI — all compute runs locally

Out of scope

High-stakes medical, legal, or financial decisions
Tasks requiring real-time internet access
Complex multi-modal tasks

Training Details

Data

OpenHermes-2.5 — instruction following
UltraChat-200k — conversational quality
Glaive Function Calling v2 — tool use and agentic tasks
Alpaca Cleaned — general instructions
Synthetic data generated via open-source teacher model (Qwen2.5-7B)

Total: ~35,000 curated examples (EXP-001 small run)

Hyperparameters

Parameter	Value
Learning rate	2e-4
Batch size	4
Gradient accumulation	4 (effective 16)
LoRA rank	16
LoRA alpha	32
Max steps	200
Optimizer	AdamW 8-bit
Scheduler	Cosine
Warmup steps	20

Hardware

GPU: Kaggle T4 (16GB VRAM)
Training time: ~1.5 hours
Compute: ~3 GPU hours

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "bolajiev/maxx-1-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a short email to my boss saying I will be 10 minutes late."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=300, temperature=0.7, do_sample=True)

reply = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(reply)

On-Device with Ollama (GGUF)

# Use the quantized GGUF version for on-device inference
ollama run bolajiev/maxx-merged-gguf

Limitations

EXP-001 is a small training run (200 steps, ~35k examples) — not a final model
Safety alignment is limited — some harmful requests may not be refused correctly
Context window limited to 2048 tokens in this checkpoint
Not evaluated on coding tasks yet
HellaSwag gap vs SmolLM2 indicates commonsense reasoning can improve

Built with Unsloth 🦥 | Trained on Kaggle T4

Downloads last month: 95

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for bolajiev/maxx-1-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(1007)

this model