Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +79 -83

README.md CHANGED Viewed

@@ -5,18 +5,19 @@ language:
   - de
 base_model: Qwen/Qwen3-4B
 tags:
   - tool-calling
   - function-calling
   - agent
-  - qwen3
   - gguf
   - fine-tuned
   - wllama
   - browser-inference
   - on-device-ai
-  - mimi-agent
 model-index:
-  - name: mimi-qwen3-4b-tool-calling
     results:
       - task:
           type: text-generation
@@ -31,95 +32,89 @@ model-index:
           - type: loss
             value: 0.084
             name: Training Loss
-datasets:
-  - MimiTechAI/mimi-tool-calling-v3
 library_name: transformers
 pipeline_tag: text-generation
 ---
-# MIMI Qwen3-4B Tool Calling
 <p align="center">
   <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
-  <img src="https://img.shields.io/badge/Quantization-Q4__K__M-blue?style=for-the-badge" alt="Quantization"/>
   <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
-  <img src="https://img.shields.io/badge/Inference-Browser%20(WASM)-purple?style=for-the-badge" alt="Browser"/>
 </p>
-A fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) optimized for **structured tool calling and function invocation** — designed to run entirely in the browser via WebAssembly (wllama/llama.cpp).
-Built by [Mimi Tech AI](https://mimitechai.com) for the [MIMI Agent](https://github.com/MimiTechAi/mimi-website) — a fully local, privacy-first AI agent that runs on-device with zero cloud dependencies.
-## Key Results
 | Metric | Value |
 |--------|-------|
 | **Token Accuracy** | 97.66% |
 | **Eval Accuracy** | 97.29% |
 | **Training Loss** | 0.084 |
 | **Training Time** | 46 minutes |
-| **Hardware** | NVIDIA DGX Spark (GB10, Grace Blackwell) |
-## Model Details
-- **Base Model:** [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) (4.02B parameters)
-- **Fine-Tuning Method:** LoRA (PEFT) via [Unsloth](https://github.com/unslothai/unsloth)
-- **LoRA Config:** rank=64, alpha=128, dropout=0.05
-- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
-- **Quantization:** GGUF Q4_K_M (4.95 bits per weight)
-- **Format:** ChatML with `<think>` reasoning blocks
-- **Languages:** English (primary), German
-## Training Data
-1,610 high-quality examples covering 19 tool types:
-| Category | Tools | Examples |
-|----------|-------|----------|
-| **Web** | `web_search`, `browse_url`, `browser_action` | Search queries, URL extraction, DOM interaction |
-| **Code** | `execute_python`, `create_file`, `edit_file` | Code generation, file manipulation |
-| **Research** | `deep_research`, `generate_document` | Multi-source analysis, report generation |
-| **System** | `read_file`, `list_directory`, `run_terminal` | File I/O, system commands |
-| **Reasoning** | Multi-step chains | Tool orchestration, error recovery |
-Each example includes structured tool calls in JSON format with parameter validation and multi-turn conversations.
-## Usage
-### Browser (wllama — recommended)
 ```typescript
 import { Wllama } from '@anthropic-ai/wllama';
-const wllama = new Wllama({
-  'single-thread/wllama.wasm': '/wllama/single-thread/wllama.wasm',
-  'multi-thread/wllama.wasm': '/wllama/multi-thread/wllama.wasm',
-});
 await wllama.loadModelFromUrl(
-  'https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling/resolve/main/mimi-qwen3-4b-q4km.gguf',
-  { n_ctx: 4096, n_threads: 4 }
 );
 const response = await wllama.createChatCompletion([
   { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
-  { role: 'user', content: 'Search for the latest AI news' }
 ]);
 ```
-### llama.cpp (CLI)
 ```bash
 ./llama-cli -m mimi-qwen3-4b-q4km.gguf \
   -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
-  -n 512 --temp 0.6 --top-p 0.95
 ```
-### Python (llama-cpp-python)
 ```python
 from llama_cpp import Llama
 llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
 output = llm.create_chat_completion(messages=[
     {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
@@ -127,21 +122,21 @@ output = llm.create_chat_completion(messages=[
 ])
 ```
-## Expected Output Format
-The model generates structured tool calls:
-```json
 <tool_call>
 {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
 </tool_call>
 ```
-Multi-tool chains are supported:
-```json
 <tool_call>
-{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specs"}}
 </tool_call>
 <tool_call>
@@ -149,73 +144,74 @@ Multi-tool chains are supported:
 </tool_call>
 ```
-## LoRA Hyperparameters
 ```yaml
 base_model: Qwen/Qwen3-4B
 lora_rank: 64
 lora_alpha: 128
 lora_dropout: 0.05
-target_modules:
-  - q_proj
-  - k_proj
-  - v_proj
-  - o_proj
-  - gate_proj
-  - up_proj
-  - down_proj
 learning_rate: 2.0e-04
-lr_scheduler: linear
-warmup_steps: 5
 epochs: 3
-batch_size: 2
-gradient_accumulation_steps: 4
 effective_batch_size: 8
 max_seq_length: 2048
 optimizer: adamw_8bit
-weight_decay: 0.01
-bf16: true
 gradient_checkpointing: true
 packing: true
 ```
-## MIMI Agent Model Family
-| Model | Parameters | Size (GGUF Q4_K_M) | Use Case | Status |
-|-------|-----------|---------------------|----------|--------|
-| mimi-qwen3-0.6b-tool-calling | 0.6B | ~400 MB | Ultra-lightweight, any device | 🔜 Coming |
-| mimi-qwen3-1.7b-tool-calling | 1.7B | ~1.0 GB | Mobile & tablets | 🔜 Coming |
-| **mimi-qwen3-4b-tool-calling** | **4.02B** | **2.3 GB** | **Desktop & laptop** | **✅ Released** |
-| mimi-qwen3-8b-tool-calling | 8B | ~4.5 GB | Power users | 🔜 Coming |
 ## Limitations
-- **Optimized for tool calling** — not a general-purpose chat model. For open-ended conversations, use the base Qwen3-4B.
-- **Context window:** 4,096 tokens (inherited from training config). Base model supports up to 32K.
-- **Quantization trade-offs:** Q4_K_M reduces quality slightly vs F16. For maximum accuracy, use the full-precision LoRA adapter.
-- **Browser memory:** Requires ~3 GB RAM for inference. Devices with <4 GB available memory may experience issues.
 ## About Mimi Tech AI
-[Mimi Tech AI](https://mimitechai.com) builds on-device AI solutions — no cloud, no data leaks, full user control.
-- 🌐 [Website](https://mimitechai.com)
 - 🐙 [GitHub](https://github.com/MimiTechAi)
 - 💼 [LinkedIn](https://linkedin.com/company/mimitechai)
-- 🟢 Member of the [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/)
 ## License
-This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), consistent with the base Qwen3-4B license.
 ## Citation
 ```bibtex
 @misc{mimitechai2026mimi,
-  title={MIMI Qwen3-4B Tool Calling: Fine-Tuned Small Language Model for Browser-Based Agent Tool Invocation},
   author={Bemler, Michael and Soppa, Michael},
   year={2026},
   publisher={Mimi Tech AI},
-  url={https://huggingface.co/MimiTechAI/mimi-qwen3-4b-tool-calling}
 }
 ```

   - de
 base_model: Qwen/Qwen3-4B
 tags:
+  - mimi
   - tool-calling
   - function-calling
   - agent
   - gguf
   - fine-tuned
   - wllama
   - browser-inference
   - on-device-ai
+  - local-ai
+  - privacy-first
 model-index:
+  - name: MIMI Pro
     results:
       - task:
           type: text-generation
           - type: loss
             value: 0.084
             name: Training Loss
 library_name: transformers
 pipeline_tag: text-generation
 ---
+# MIMI Pro
 <p align="center">
+  <img src="https://img.shields.io/badge/MIMI-Pro-black?style=for-the-badge&labelColor=000000" alt="MIMI Pro"/>
   <img src="https://img.shields.io/badge/Accuracy-97.7%25-brightgreen?style=for-the-badge" alt="Accuracy"/>
   <img src="https://img.shields.io/badge/Size-2.3GB-orange?style=for-the-badge" alt="Size"/>
+  <img src="https://img.shields.io/badge/Runs_In-Browser-purple?style=for-the-badge" alt="Browser"/>
+  <img src="https://img.shields.io/badge/Cloud-Zero-red?style=for-the-badge" alt="Zero Cloud"/>
 </p>
+**MIMI Pro** is a 4-billion parameter AI agent model optimized for **structured tool calling and autonomous task execution** — designed to run entirely on-device, in the browser, with zero cloud dependencies.
+Part of the **MIMI Model Family** by [Mimi Tech AI](https://mimitechai.com).
+> 💡 MIMI Pro achieves **97.7% tool-calling accuracy** while running completely locally. Your data never leaves your device.
+## Performance
 | Metric | Value |
 |--------|-------|
 | **Token Accuracy** | 97.66% |
 | **Eval Accuracy** | 97.29% |
 | **Training Loss** | 0.084 |
+| **Parameters** | 4.02 Billion |
+| **Quantized Size** | 2.3 GB (Q4_K_M) |
 | **Training Time** | 46 minutes |
+| **Training Hardware** | NVIDIA DGX Spark (Grace Blackwell) |
+## Architecture
+MIMI Pro is built on the [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) architecture, fine-tuned with LoRA (rank=64, alpha=128) on 1,610 curated tool-calling examples using [Unsloth](https://github.com/unslothai/unsloth) on NVIDIA DGX Spark.
+**Key Design Decisions:**
+- **ChatML format** with `<think>` reasoning blocks for chain-of-thought
+- **19 tool types** covering web search, code execution, file operations, browser automation, and deep research
+- **Multi-step chains** — the model plans and executes sequences of tools autonomously
+- **Error recovery** — trained on failure cases to self-correct
+## Supported Tools
+| Category | Tools |
+|----------|-------|
+| 🌐 **Web** | `web_search`, `browse_url`, `browser_action` |
+| 💻 **Code** | `execute_python`, `create_file`, `edit_file` |
+| 🔬 **Research** | `deep_research`, `generate_document` |
+| 📁 **System** | `read_file`, `list_directory`, `run_terminal` |
+| 🧠 **Reasoning** | Multi-step orchestration, error recovery |
+## Quick Start
+### Browser (wllama/WebAssembly)
 ```typescript
 import { Wllama } from '@anthropic-ai/wllama';
+const wllama = new Wllama(wasmPaths);
 await wllama.loadModelFromUrl(
+  'https://huggingface.co/MimiTechAI/mimi-pro/resolve/main/mimi-qwen3-4b-q4km.gguf',
+  { n_ctx: 4096 }
 );
 const response = await wllama.createChatCompletion([
   { role: 'system', content: 'You are MIMI, an AI agent with tool access.' },
+  { role: 'user', content: 'Search for the latest AI news and summarize it' }
 ]);
 ```
+### llama.cpp
 ```bash
 ./llama-cli -m mimi-qwen3-4b-q4km.gguf \
   -p "<|im_start|>system\nYou are MIMI, an AI agent with tool access.<|im_end|>\n<|im_start|>user\nSearch for the latest AI news<|im_end|>\n<|im_start|>assistant\n" \
+  -n 512 --temp 0.6
 ```
+### Python
 ```python
 from llama_cpp import Llama
 llm = Llama(model_path="mimi-qwen3-4b-q4km.gguf", n_ctx=4096)
 output = llm.create_chat_completion(messages=[
     {"role": "system", "content": "You are MIMI, an AI agent with tool access."},
 ])
 ```
+## Output Format
+MIMI Pro generates structured tool calls:
+```xml
 <tool_call>
 {"name": "web_search", "arguments": {"query": "latest AI news March 2026", "num_results": 5}}
 </tool_call>
 ```
+Multi-tool chains for complex tasks:
+```xml
 <tool_call>
+{"name": "web_search", "arguments": {"query": "NVIDIA DGX Spark specifications"}}
 </tool_call>
 <tool_call>
 </tool_call>
 ```
+## The MIMI Model Family
+| Model | Parameters | Size | Target Device | Status |
+|-------|-----------|------|---------------|--------|
+| **MIMI Nano** | 0.6B | ~400 MB | Any device, IoT | 🔜 Coming |
+| **MIMI Small** | 1.7B | ~1.0 GB | Mobile & tablets | 🔜 Coming |
+| **MIMI Pro** | 4.02B | 2.3 GB | Desktop & laptop | ✅ **Available** |
+| **MIMI Max** | 8B | ~4.5 GB | Workstations | 🔜 Coming |
+All models share the same tool-calling format, are quantized to GGUF Q4_K_M, and run in the browser via WebAssembly.
+## Training Details
 ```yaml
+method: LoRA (PEFT) via Unsloth
 base_model: Qwen/Qwen3-4B
 lora_rank: 64
 lora_alpha: 128
 lora_dropout: 0.05
+target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
 learning_rate: 2.0e-04
 epochs: 3
 effective_batch_size: 8
 max_seq_length: 2048
 optimizer: adamw_8bit
+precision: bf16
 gradient_checkpointing: true
 packing: true
+dataset: 1,610 curated tool-calling examples (178K tokens)
+hardware: NVIDIA DGX Spark (GB10 Grace Blackwell, 128 GB unified memory)
 ```
+## Why MIMI?
+- **🔒 Privacy First** — Your data never leaves your device. Period.
+- **💰 Zero Cost** — No API keys, no subscriptions, no per-token billing.
+- **⚡ Fast** — Runs at native speed via WebAssembly, no server round-trips.
+- **🌍 Works Offline** — Once downloaded, no internet required.
+- **🔧 Tool Native** — Purpose-built for autonomous tool calling, not retrofitted.
 ## Limitations
+- Optimized for tool calling — for general chat, use the base model directly.
+- Context window: 4,096 tokens (training config). Base architecture supports 32K.
+- Requires ~3 GB RAM for inference in browser.
+- Q4_K_M quantization trades minimal quality for 3.5x size reduction.
 ## About Mimi Tech AI
+[Mimi Tech AI](https://mimitechai.com) builds on-device AI — no cloud, no data leaks, full user control.
+- 🌐 [mimitechai.com](https://mimitechai.com)
 - 🐙 [GitHub](https://github.com/MimiTechAi)
 - 💼 [LinkedIn](https://linkedin.com/company/mimitechai)
+- 🟢 [NVIDIA Connect Program](https://www.nvidia.com/en-us/industries/nvidia-connect-program/) Member
 ## License
+Apache 2.0 — free for commercial and personal use.
 ## Citation
 ```bibtex
 @misc{mimitechai2026mimi,
+  title={MIMI Pro: On-Device AI Agent Model for Browser-Based Tool Calling},
   author={Bemler, Michael and Soppa, Michael},
   year={2026},
   publisher={Mimi Tech AI},
+  url={https://huggingface.co/MimiTechAI/mimi-pro}
 }
 ```