mimi-pro / README.md
MimiTechAI's picture
Fix: Clarify sample sizes in BFCL table, consistent irrelevance score notation
207ef19 verified
metadata
license: apache-2.0
language:
  - en
  - de
base_model: Qwen/Qwen3-4B
tags:
  - mimi
  - tool-calling
  - function-calling
  - agent
  - gguf
  - fine-tuned
  - wllama
  - browser-inference
  - on-device-ai
  - local-ai
  - privacy-first
model-index:
  - name: MIMI Pro
    results:
      - task:
          type: function-calling
          name: Tool Calling
        dataset:
          type: gorilla-llm/Berkeley-Function-Calling-Leaderboard
          name: BFCL V4
        metrics:
          - type: accuracy
            value: 60.8
            name: Simple Function Calling (Python)
            verified: false
          - type: accuracy
            value: 57.5
            name: Multiple Sequential Calls
            verified: false
          - type: accuracy
            value: 90
            name: Irrelevance Detection
            verified: false
pipeline_tag: text-generation

MIMI Pro

MIMI Pro is a 4-billion parameter AI agent model optimized for structured tool calling and autonomous task execution β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.

Part of the MIMI Model Family by Mimi Tech AI.

πŸ”¬ V1 β€” Experimental Release. This model is fine-tuned for the MIMI Agent's custom tool-calling format. For standard tool calling, the base Qwen3-4B may perform equally well or better with native <tool_call> prompting. V2 with official BFCL scores and Qwen3-native format support is in development.

Performance

BFCL V4 Benchmark (Partial β€” Single-Turn, 20 samples/category)

Category MIMI Pro V1 Base Qwen3-4B Notes
Simple Python 60.8% (400 tests) 80.0% (20 tests) Base outperforms
Simple Java 21.0% (100 tests) 60.0% (20 tests) Base outperforms
Multiple (Sequential) 57.5% (200 tests) 75.0% (20 tests) Base outperforms
Parallel 2.0% (200 tests) 75.0% (20 tests) Fine-tune degraded
Irrelevance 90% (20 tests) 100% (20 tests) Both strong
Live Simple β€” 90.0% (20 tests) Base only

⚠️ Important Context: The previously reported "97.7% accuracy" was a training validation metric (token-level accuracy on the training/eval split), not a standardized benchmark score. The table above shows actual BFCL V4 results. We are working on a full official evaluation.

Training Metrics (Internal)

Metric Value
Training Token Accuracy 97.66%
Eval Token Accuracy 97.29%
Training Loss 0.084
Parameters 4.02 Billion
Quantized Size 2.3 GB (Q4_K_M)

Architecture

MIMI Pro is built on Qwen3-4B, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using Unsloth on NVIDIA DGX Spark.

Key Design Decisions:

  • Custom tool-calling format optimized for the MIMI Agent browser environment
  • 19 tool types covering web search, code execution, file operations, browser automation
  • Trained on NVIDIA DGX Spark (Grace Blackwell GB10, 128 GB unified memory)

Known Limitations of V1:

  • Fine-tuning with aggressive hyperparameters (LoRA r=64, 3 epochs, LR 2e-4) caused some capability degradation vs. the base model, particularly for parallel tool calling
  • The custom {"tool": ..., "parameters": ...} format diverges from Qwen3's native <tool_call> format
  • V2 will address these issues with conservative fine-tuning and Qwen3-native format support

Supported Tools

Category Tools
🌐 Web web_search, browse_url, browser_action
πŸ’» Code execute_python, create_file, edit_file
πŸ”¬ Research deep_research, generate_document
πŸ“ System read_file, list_directory, run_terminal
🧠 Reasoning Multi-step orchestration

Quick Start

Browser (wllama/WebAssembly)

import { Wllama } from '@anthropic-ai/wllama';

const wllama = new Wllama(wasmPaths);
await wllama.loadModelFromUrl(
  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
  { n_ctx: 4096 }
);

const response = await wllama.createChatCompletion([
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
  { role: 'user', content: 'Search for the latest AI news and summarize it' }
]);

llama.cpp

./llama-cli -m mimi-qwen3-4b-q4km.gguf \
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
  -n 512 --temp 0.6

Python

from llama_cpp import Llama
llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
output = llm.create_chat_completion(messages=[
    {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
    {"role": "user", "content": "Search for the latest AI news"}
])

Output Format

MIMI Pro V1 uses a custom format (V2 will support Qwen3-native <tool_call> format):

{"tool": "web_search", "parameters": {"query": "latest AI news March 2026", "limit": 5}}

The MIMI Model Family

Model Parameters Size Target Device Status
MIMI Nano 0.6B ~400 MB Any device, IoT πŸ”œ Coming
MIMI Small 1.7B ~1.0 GB Mobile & tablets πŸ”œ Coming
MIMI Pro 4.02B 2.3 GB Desktop & laptop βœ… Available
MIMI Max 8B ~4.5 GB Workstations πŸ”œ Coming

All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.

Training Details

method: LoRA (PEFT) via Unsloth
base_model: Qwen/Qwen3-4B
lora_rank: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
learning_rate: 2.0e-04
epochs: 3
effective_batch_size: 8
max_seq_length: 2048
optimizer: adamw_8bit
precision: bf16
gradient_checkpointing: true
packing: true
dataset: 1,610 curated tool-calling examples (178K tokens)
hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)

Why MIMI?

  • πŸ”’ Privacy First β€” Your data never leaves your device. Period.
  • πŸ’° Zero Cost β€” No API keys, no subscriptions, no per-token billing.
  • ⚑ Fast β€” Runs at native speed via WebAssembly, no server round-trips.
  • 🌍 Works Offline β€” Once downloaded, no internet required.
  • πŸ”§ Tool Native β€” Purpose-built for autonomous tool calling.

Limitations

  • V1 uses a custom tool-calling format (not Qwen3-native <tool_call>)
  • Parallel tool calling (multiple simultaneous calls) is degraded vs. base model
  • Context window: 4,096 tokens (training config). Base architecture supports 32K.
  • Requires ~3 GB RAM for inference in browser.
  • Q4_K_M quantization trades minimal quality for 3.5x size reduction.

Roadmap

  • V1 β€” Custom format, 19 tools, browser-optimized (current release)
  • V2 β€” Qwen3-native <tool_call> format, official BFCL V4 scores, conservative fine-tuning
  • Model Family β€” Nano (0.6B), Small (1.7B), Max (8B) releases
  • Multi-Turn β€” Agentic conversation chains with tool result feedback

About Mimi Tech AI

Mimi Tech AI builds on-device AI β€” no cloud, no data leaks, full user control.

License

Apache 2.0 β€” free for commercial and personal use.

Citation

@misc{mimitechai2026mimi,
  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
  author={Bemler, Michael and Soppa, Michael},
  year={2026},
  publisher={Mimi Tech AI},
  url={https://huggingface.co/MimiTechAI/mimi-pro}
}