mimi-pro / README.md

Fix: Clarify sample sizes in BFCL table, consistent irrelevance score notation

207ef19 verified about 11 hours ago

8 kB

license: apache-2.0
language:
  - en
  - de
base_model: Qwen/Qwen3-4B
tags:
  - mimi
  - tool-calling
  - function-calling
  - agent
  - gguf
  - fine-tuned
  - wllama
  - browser-inference
  - on-device-ai
  - local-ai
  - privacy-first
model-index:
  - name: MIMI Pro
    results:
      - task:
          type: function-calling
          name: Tool Calling
        dataset:
          type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
          name: BFCL V4
        metrics:
          - type: accuracy
            value: 60.8
            name: Simple Function Calling (Python)
            verified: false
          - type: accuracy
            value: 57.5
            name: Multiple Sequential Calls
            verified: false
          - type: accuracy
            value: 90
            name: Irrelevance Detection
            verified: false
pipeline_tag: text-generation

MIMI Pro

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution — designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by Mimi Tech AI.

🔬 V1 — Experimental Release. This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base Qwen3-4B may perform equally well or better with native <tool_call> prompting. V2 with official BFCL scores and Qwen3-native format support is in development.

Performance

BFCL V4 Benchmark (Partial — Single-Turn, 20 samples/category)

Category	MIMI Pro V1	Base Qwen3-4B	Notes
Simple Python	60.8% (400 tests)	80.0% (20 tests)	Base outperforms
Simple Java	21.0% (100 tests)	60.0% (20 tests)	Base outperforms
Multiple (Sequential)	57.5% (200 tests)	75.0% (20 tests)	Base outperforms
Parallel	2.0% (200 tests)	75.0% (20 tests)	Fine-tune degraded
Irrelevance	90% (20 tests)	100% (20 tests)	Both strong
Live Simple	—	90.0% (20 tests)	Base only

⚠️ Important Context: The previously reported "97.7% accuracy" was a training validation metric (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.

Training Metrics (Internal)

Metric	Value
Training Token Accuracy	97.66%
Eval Token Accuracy	97.29%
Training Loss	0.084
Parameters	4.02 Billion
Quantized Size	2.3 GB (Q4_K_M)

Architecture

MIMI Pro is built on Qwen3-4B, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.

Key Design Decisions:

Custom tool-calling format optimized for the MIMI Agent browser environment
19 tool types covering web search, code execution, file operations, browser automation
Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)

Known Limitations of V1:

Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
The custom {"tool": ..., "parameters": ...} format diverges from Qwen3's native <tool_call> format
V2 will address these issues with conservative fine-tuning and Qwen3-native format support

Supported Tools

Category	Tools
🌐 Web	web_search, browse_url, browser_action
💻 Code	execute_python, create_file, edit_file
🔬 Research	deep_research, generate_document
📁 System	read_file, list_directory, run_terminal
🧠 Reasoning	Multi-step orchestration

Quick Start

Browser (wllama/WebAssembly)

import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);

llama.cpp

./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6

Python

from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])

Output Format

MIMI Pro V1 uses a custom format (V2 will support Qwen3-native <tool_call> format):

{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}

The MIMI Model Family

Model	Parameters	Size	Target Device	Status
MIMI Nano	0.6B	~400 MB	Any device, IoT	🔜 Coming
MIMI Small	1.7B	~1.0 GB	Mobile & tablets	🔜 Coming
MIMI Pro	4.02B	2.3 GB	Desktop & laptop	✅ Available
MIMI Max	8B	~4.5 GB	Workstations	🔜 Coming

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

Training Details

method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)

Why MIMI?

🔒 Privacy First — Your data never leaves your device. Period.
💰 Zero Cost — No API keys, no subscriptions, no per-token billing.
⚡ Fast — Runs at native speed via WebAssembly, no server round-trips.
🌍 Works Offline — Once downloaded, no internet required.
🔧 Tool Native — Purpose-built for autonomous tool calling.

Limitations

V1 uses a custom tool-calling format (not Qwen3-native <tool_call>)
Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
Context window: 4,096 tokens (training config). Base architecture supports 32K.
Requires ~3 GB RAM for inference in browser.
Q4_K_M quantization trades minimal quality for 3.5x size reduction.

Roadmap

V1 — Custom format, 19 tools, browser-optimized (current release)
V2 — Qwen3-native <tool_call> format, official BFCL V4 scores, conservative fine-tuning
Model Family — Nano (0.6B), Small (1.7B), Max (8B) releases
Multi-Turn — Agentic conversation chains with tool result feedback

About Mimi Tech AI

Mimi Tech AI builds on-device AI — no cloud, no data leaks, full user control.

License

Apache 2.0 — free for commercial and personal use.

Citation

@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}