MimiTechAI commited on
Commit
8d272fa
Β·
verified Β·
1 Parent(s): 7d2ab9f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +79 -83
README.md CHANGED
@@ -5,18 +5,19 @@ language:
5
  - de
6
  base_model: Qwen/Qwen3-4B
7
  tags:
 
8
  - tool-calling
9
  - function-calling
10
  - agent
11
- - qwen3
12
  - gguf
13
  - fine-tuned
14
  - wllama
15
  - browser-inference
16
  - on-device-ai
17
- - mimi-agent
 
18
  model-index:
19
- - name: mimi-qwen3-4b-tool-calling
20
  results:
21
  - task:
22
  type: text-generation
@@ -31,95 +32,89 @@ model-index:
31
  - type: loss
32
  value: 0.084
33
  name: Training Loss
34
- datasets:
35
- - MimiTechAI/mimi-tool-calling-v3
36
  library_name: transformers
37
  pipeline_tag: text-generation
38
  ---
39
 
40
- # MIMI Qwen3-4B Tool Calling
41
 
42
  <p align="center">
 
43
  <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
44
- <img src="https://img.shields.io/badge/Quantization-Q4__K__M-blue?style=for-the-badge" alt="Quantization"/>
45
  <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
46
- <img src="https://img.shields.io/badge/Inference-Browser%20(WASM)-purple?style=for-the-badge" alt="Browser"/>
 
47
  </p>
48
 
49
- A fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) optimized for **structured tool calling and function invocation** β€” designed to run entirely in the browser via WebAssembly (wllama/llama.cpp).
50
 
51
- Built by [Mimi Tech AI](https://mimitechai.com) for the [MIMI Agent](https://github.com/MimiTechAi/mimi-website) β€” a fully local, privacy-first AI agent that runs on-device with zero cloud dependencies.
52
 
53
- ## Key Results
 
 
54
 
55
  | Metric | Value |
56
  |--------|-------|
57
  | **Token Accuracy** | 97.66% |
58
  | **Eval Accuracy** | 97.29% |
59
  | **Training Loss** | 0.084 |
 
 
60
  | **Training Time** | 46 minutes |
61
- | **Hardware** | NVIDIA DGX Spark (GB10, Grace Blackwell) |
62
-
63
- ## Model Details
64
 
65
- - **Base Model:** [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) (4.02B parameters)
66
- - **Fine-Tuning Method:** LoRA (PEFT) via [Unsloth](https://github.com/unslothai/unsloth)
67
- - **LoRA Config:** rank=64, alpha=128, dropout=0.05
68
- - **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
69
- - **Quantization:** GGUF Q4_K_M (4.95 bits per weight)
70
- - **Format:** ChatML with `<think>` reasoning blocks
71
- - **Languages:** English (primary), German
72
 
73
- ## Training Data
74
 
75
- 1,610 high-quality examples covering 19 tool types:
 
 
 
 
76
 
77
- | Category | Tools | Examples |
78
- |----------|-------|----------|
79
- | **Web** | `web_search`, `browse_url`, `browser_action` | Search queries, URL extraction, DOM interaction |
80
- | **Code** | `execute_python`, `create_file`, `edit_file` | Code generation, file manipulation |
81
- | **Research** | `deep_research`, `generate_document` | Multi-source analysis, report generation |
82
- | **System** | `read_file`, `list_directory`, `run_terminal` | File I/O, system commands |
83
- | **Reasoning** | Multi-step chains | Tool orchestration, error recovery |
84
 
85
- Each example includes structured tool calls in JSON format with parameter validation and multi-turn conversations.
 
 
 
 
 
 
86
 
87
- ## Usage
88
 
89
- ### Browser (wllama β€” recommended)
90
 
91
  ```typescript
92
  import { Wllama } from '@anthropic-ai/wllama';
93
 
94
- const wllama = new Wllama({
95
- 'single-thread/wllama.wasm': '/wllama/single-thread/wllama.wasm',
96
- 'multi-thread/wllama.wasm': '/wllama/multi-thread/wllama.wasm',
97
- });
98
-
99
  await wllama.loadModelFromUrl(
100
- 'https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling/resolve/main/mimi-qwen3-4b-q4km.gguf',
101
- { n_ctx: 4096, n_threads: 4 }
102
  );
103
 
104
  const response = await wllama.createChatCompletion([
105
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
106
- { role: 'user', content: 'Search for the latest AI news' }
107
  ]);
108
  ```
109
 
110
- ### llama.cpp (CLI)
111
 
112
  ```bash
113
  ./llama-cli -m mimi-qwen3-4b-q4km.gguf \
114
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
115
- -n 512 --temp 0.6 --top-p 0.95
116
  ```
117
 
118
- ### Python (llama-cpp-python)
119
 
120
  ```python
121
  from llama_cpp import Llama
122
-
123
  llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
124
  output = llm.create_chat_completion(messages=[
125
  {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
@@ -127,21 +122,21 @@ output = llm.create_chat_completion(messages=[
127
  ])
128
  ```
129
 
130
- ## Expected Output Format
131
 
132
- The model generates structured tool calls:
133
 
134
- ```json
135
  <tool_call>
136
  {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
137
  </tool_call>
138
  ```
139
 
140
- Multi-tool chains are supported:
141
 
142
- ```json
143
  <tool_call>
144
- {"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specs"}}
145
  </tool_call>
146
 
147
  <tool_call>
@@ -149,73 +144,74 @@ Multi-tool chains are supported:
149
  </tool_call>
150
  ```
151
 
152
- ## LoRA Hyperparameters
 
 
 
 
 
 
 
 
 
 
 
153
 
154
  ```yaml
 
155
  base_model: Qwen/Qwen3-4B
156
  lora_rank: 64
157
  lora_alpha: 128
158
  lora_dropout: 0.05
159
- target_modules:
160
- - q_proj
161
- - k_proj
162
- - v_proj
163
- - o_proj
164
- - gate_proj
165
- - up_proj
166
- - down_proj
167
  learning_rate: 2.0e-04
168
- lr_scheduler: linear
169
- warmup_steps: 5
170
  epochs: 3
171
- batch_size: 2
172
- gradient_accumulation_steps: 4
173
  effective_batch_size: 8
174
  max_seq_length: 2048
175
  optimizer: adamw_8bit
176
- weight_decay: 0.01
177
- bf16: true
178
  gradient_checkpointing: true
179
  packing: true
 
 
180
  ```
181
 
182
- ## MIMI Agent Model Family
183
 
184
- | Model | Parameters | Size (GGUF Q4_K_M) | Use Case | Status |
185
- |-------|-----------|---------------------|----------|--------|
186
- | mimi-qwen3-0.6b-tool-calling | 0.6B | ~400 MB | Ultra-lightweight, any device | πŸ”œ Coming |
187
- | mimi-qwen3-1.7b-tool-calling | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
188
- | **mimi-qwen3-4b-tool-calling** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **βœ… Released** |
189
- | mimi-qwen3-8b-tool-calling | 8B | ~4.5 GB | Power users | πŸ”œ Coming |
190
 
191
  ## Limitations
192
 
193
- - **Optimized for tool calling** β€” not a general-purpose chat model. For open-ended conversations, use the base Qwen3-4B.
194
- - **Context window:** 4,096 tokens (inherited from training config). Base model supports up to 32K.
195
- - **Quantization trade-offs:** Q4_K_M reduces quality slightly vs F16. For maximum accuracy, use the full-precision LoRA adapter.
196
- - **Browser memory:** Requires ~3 GB RAM for inference. Devices with <4 GB available memory may experience issues.
197
 
198
  ## About Mimi Tech AI
199
 
200
- [Mimi Tech AI](https://mimitechai.com) builds on-device AI solutions β€” no cloud, no data leaks, full user control.
201
 
202
- - 🌐 [Website](https://mimitechai.com)
203
  - πŸ™ [GitHub](https://github.com/MimiTechAi)
204
  - πŸ’Ό [LinkedIn](https://linkedin.com/company/mimitechai)
205
- - 🟒 Member of the [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/)
206
 
207
  ## License
208
 
209
- This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the base Qwen3-4B license.
210
 
211
  ## Citation
212
 
213
  ```bibtex
214
  @misc{mimitechai2026mimi,
215
- title={MIMI Qwen3-4B Tool Calling: Fine-Tuned Small Language Model for Browser-Based Agent Tool Invocation},
216
  author={Bemler, Michael and Soppa, Michael},
217
  year={2026},
218
  publisher={Mimi Tech AI},
219
- url={https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling}
220
  }
221
  ```
 
5
  - de
6
  base_model: Qwen/Qwen3-4B
7
  tags:
8
+ - mimi
9
  - tool-calling
10
  - function-calling
11
  - agent
 
12
  - gguf
13
  - fine-tuned
14
  - wllama
15
  - browser-inference
16
  - on-device-ai
17
+ - local-ai
18
+ - privacy-first
19
  model-index:
20
+ - name: MIMI Pro
21
  results:
22
  - task:
23
  type: text-generation
 
32
  - type: loss
33
  value: 0.084
34
  name: Training Loss
 
 
35
  library_name: transformers
36
  pipeline_tag: text-generation
37
  ---
38
 
39
+ # MIMI Pro
40
 
41
  <p align="center">
42
+ <img src="https://img.shields.io/badge/MIMI-Pro-black?style=for-the-badge&labelColor=000000" alt="MIMI Pro"/>
43
  <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
 
44
  <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
45
+ <img src="https://img.shields.io/badge/Runs_In-Browser-purple?style=for-the-badge" alt="Browser"/>
46
+ <img src="https://img.shields.io/badge/Cloud-Zero-red?style=for-the-badge" alt="Zero Cloud"/>
47
  </p>
48
 
49
+ **MIMI Pro** is a 4-billion parameter AI agent model optimized for **structured tool calling and autonomous task execution** β€” designed to run entirely on-device, in the browser, with zero cloud dependencies.
50
 
51
+ Part of the **MIMI Model Family** by [Mimi Tech AI](https://mimitechai.com).
52
 
53
+ > πŸ’‘ MIMI Pro achieves **97.7% tool-calling accuracy** while running completely locally. Your data never leaves your device.
54
+
55
+ ## Performance
56
 
57
  | Metric | Value |
58
  |--------|-------|
59
  | **Token Accuracy** | 97.66% |
60
  | **Eval Accuracy** | 97.29% |
61
  | **Training Loss** | 0.084 |
62
+ | **Parameters** | 4.02 Billion |
63
+ | **Quantized Size** | 2.3 GB (Q4_K_M) |
64
  | **Training Time** | 46 minutes |
65
+ | **Training Hardware** | NVIDIA DGX Spark (Grace Blackwell) |
 
 
66
 
67
+ ## Architecture
 
 
 
 
 
 
68
 
69
+ MIMI Pro is built on the [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
70
 
71
+ **Key Design Decisions:**
72
+ - **ChatML format** with `<think>` reasoning blocks for chain-of-thought
73
+ - **19 tool types** covering web search, code execution, file operations, browser automation, and deep research
74
+ - **Multi-step chains** β€” the model plans and executes sequences of tools autonomously
75
+ - **Error recovery** β€” trained on failure cases to self-correct
76
 
77
+ ## Supported Tools
 
 
 
 
 
 
78
 
79
+ | Category | Tools |
80
+ |----------|-------|
81
+ | 🌐 **Web** | `web_search`, `browse_url`, `browser_action` |
82
+ | πŸ’» **Code** | `execute_python`, `create_file`, `edit_file` |
83
+ | πŸ”¬ **Research** | `deep_research`, `generate_document` |
84
+ | πŸ“ **System** | `read_file`, `list_directory`, `run_terminal` |
85
+ | 🧠 **Reasoning** | Multi-step orchestration, error recovery |
86
 
87
+ ## Quick Start
88
 
89
+ ### Browser (wllama/WebAssembly)
90
 
91
  ```typescript
92
  import { Wllama } from '@anthropic-ai/wllama';
93
 
94
+ const wllama = new Wllama(wasmPaths);
 
 
 
 
95
  await wllama.loadModelFromUrl(
96
+ 'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
97
+ { n_ctx: 4096 }
98
  );
99
 
100
  const response = await wllama.createChatCompletion([
101
  { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
102
+ { role: 'user', content: 'Search for the latest AI news and summarize it' }
103
  ]);
104
  ```
105
 
106
+ ### llama.cpp
107
 
108
  ```bash
109
  ./llama-cli -m mimi-qwen3-4b-q4km.gguf \
110
  -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
111
+ -n 512 --temp 0.6
112
  ```
113
 
114
+ ### Python
115
 
116
  ```python
117
  from llama_cpp import Llama
 
118
  llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
119
  output = llm.create_chat_completion(messages=[
120
  {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
 
122
  ])
123
  ```
124
 
125
+ ## Output Format
126
 
127
+ MIMI Pro generates structured tool calls:
128
 
129
+ ```xml
130
  <tool_call>
131
  {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
132
  </tool_call>
133
  ```
134
 
135
+ Multi-tool chains for complex tasks:
136
 
137
+ ```xml
138
  <tool_call>
139
+ {"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
140
  </tool_call>
141
 
142
  <tool_call>
 
144
  </tool_call>
145
  ```
146
 
147
+ ## The MIMI Model Family
148
+
149
+ | Model | Parameters | Size | Target Device | Status |
150
+ |-------|-----------|------|---------------|--------|
151
+ | **MIMI Nano** | 0.6B | ~400 MB | Any device, IoT | πŸ”œ Coming |
152
+ | **MIMI Small** | 1.7B | ~1.0 GB | Mobile & tablets | πŸ”œ Coming |
153
+ | **MIMI Pro** | 4.02B | 2.3 GB | Desktop & laptop | βœ… **Available** |
154
+ | **MIMI Max** | 8B | ~4.5 GB | Workstations | πŸ”œ Coming |
155
+
156
+ All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
157
+
158
+ ## Training Details
159
 
160
  ```yaml
161
+ method: LoRA (PEFT) via Unsloth
162
  base_model: Qwen/Qwen3-4B
163
  lora_rank: 64
164
  lora_alpha: 128
165
  lora_dropout: 0.05
166
+ target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
 
 
 
 
 
 
 
167
  learning_rate: 2.0e-04
 
 
168
  epochs: 3
 
 
169
  effective_batch_size: 8
170
  max_seq_length: 2048
171
  optimizer: adamw_8bit
172
+ precision: bf16
 
173
  gradient_checkpointing: true
174
  packing: true
175
+ dataset: 1,610 curated tool-calling examples (178K tokens)
176
+ hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
177
  ```
178
 
179
+ ## Why MIMI?
180
 
181
+ - **πŸ”’ Privacy First** β€” Your data never leaves your device. Period.
182
+ - **πŸ’° Zero Cost** β€” No API keys, no subscriptions, no per-token billing.
183
+ - **⚑ Fast** β€” Runs at native speed via WebAssembly, no server round-trips.
184
+ - **🌍 Works Offline** β€” Once downloaded, no internet required.
185
+ - **πŸ”§ Tool Native** β€” Purpose-built for autonomous tool calling, not retrofitted.
 
186
 
187
  ## Limitations
188
 
189
+ - Optimized for tool calling β€” for general chat, use the base model directly.
190
+ - Context window: 4,096 tokens (training config). Base architecture supports 32K.
191
+ - Requires ~3 GB RAM for inference in browser.
192
+ - Q4_K_M quantization trades minimal quality for 3.5x size reduction.
193
 
194
  ## About Mimi Tech AI
195
 
196
+ [Mimi Tech AI](https://mimitechai.com) builds on-device AI β€” no cloud, no data leaks, full user control.
197
 
198
+ - 🌐 [mimitechai.com](https://mimitechai.com)
199
  - πŸ™ [GitHub](https://github.com/MimiTechAi)
200
  - πŸ’Ό [LinkedIn](https://linkedin.com/company/mimitechai)
201
+ - 🟒 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member
202
 
203
  ## License
204
 
205
+ Apache 2.0 β€” free for commercial and personal use.
206
 
207
  ## Citation
208
 
209
  ```bibtex
210
  @misc{mimitechai2026mimi,
211
+ title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
212
  author={Bemler, Michael and Soppa, Michael},
213
  year={2026},
214
  publisher={Mimi Tech AI},
215
+ url={https://huggingface.co/MimiTechAI/mimi-pro}
216
  }
217
  ```