Text Generation
Safetensors
English
qwen2
unsloth
trl
lora
on-device
agentic
offline
fine-tuned
conversational
Instructions to use bolajiev/maxx-1-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Inference
- Local Apps Settings
- Unsloth Studio
How to use bolajiev/maxx-1-1.5B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bolajiev/maxx-1-1.5B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bolajiev/maxx-1-1.5B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bolajiev/maxx-1-1.5B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="bolajiev/maxx-1-1.5B", max_seq_length=2048, )
maxx — On-Device Agentic LLM (1.5B)
A fine-tuned Qwen2.5-1.5B-Instruct model optimized for agentic tasks, instruction following, and real-world offline use on phones and laptops. First checkpoint in an ongoing research project targeting the best open-source agentic model under 3B parameters.
Model Details
| Field | Details |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Parameters | 1.5B |
| Fine-tune method | QLoRA (4-bit, rank 16) |
| Framework | Unsloth + TRL |
| Context window | 2048 tokens |
| License | Apache 2.0 |
| Developer | bolajiev (Independent Researcher) |
| Status | EXP-001 — active research |
Benchmark Results (EXP-001)
Evaluated using lm-evaluation-harness with 5-shot prompting.
| Benchmark | maxx (1.5B) | Qwen2.5-1.5B-Instruct | SmolLM2-1.7B-Instruct |
|---|---|---|---|
| ARC-Challenge ↑ | 52.47% | 53.92% | 51.88% |
| HellaSwag ↑ | 67.02% | 67.71% | 72.20% |
| WinoGrande ↑ | 65.51% | 64.64% | 68.98% |
| TruthfulQA ↑ | 45.99% | 46.61% | 39.96% |
| MMLU ↑ | 59.87% | — | — |
| Average | 57.75% | 58.22% | 58.26% |
Key findings:
- Within 0.5% of both larger/better-resourced competitors on first training run
- Beats SmolLM2-1.7B on TruthfulQA by +6 points — a bigger model
- MMLU of 59.87% outperforms published reference scores for both competitors
- Strong commonsense and knowledge base retained from Qwen2.5 foundation
Intended Use
Primary use cases
- On-device AI assistant for phones and laptops (no internet required)
- Instruction following and task completion offline
- Summarization, email writing, scheduling, planning
- Agentic multi-step reasoning for everyday tasks
- Privacy-first AI — all compute runs locally
Out of scope
- High-stakes medical, legal, or financial decisions
- Tasks requiring real-time internet access
- Complex multi-modal tasks
Training Details
Data
- OpenHermes-2.5 — instruction following
- UltraChat-200k — conversational quality
- Glaive Function Calling v2 — tool use and agentic tasks
- Alpaca Cleaned — general instructions
- Synthetic data generated via open-source teacher model (Qwen2.5-7B)
Total: ~35,000 curated examples (EXP-001 small run)
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Batch size | 4 |
| Gradient accumulation | 4 (effective 16) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Max steps | 200 |
| Optimizer | AdamW 8-bit |
| Scheduler | Cosine |
| Warmup steps | 20 |
Hardware
- GPU: Kaggle T4 (16GB VRAM)
- Training time: ~1.5 hours
- Compute: ~3 GPU hours
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "bolajiev/maxx-1-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
messages = [{"role": "user", "content": "Write a short email to my boss saying I will be 10 minutes late."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=300, temperature=0.7, do_sample=True)
reply = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(reply)
On-Device with Ollama (GGUF)
# Use the quantized GGUF version for on-device inference
ollama run bolajiev/maxx-merged-gguf
Limitations
- EXP-001 is a small training run (200 steps, ~35k examples) — not a final model
- Safety alignment is limited — some harmful requests may not be refused correctly
- Context window limited to 2048 tokens in this checkpoint
- Not evaluated on coding tasks yet
- HellaSwag gap vs SmolLM2 indicates commonsense reasoning can improve
Built with Unsloth 🦥 | Trained on Kaggle T4
- Downloads last month
- 95