File size: 3,187 Bytes
640ecdb
 
 
 
2b9cf4a
640ecdb
 
 
 
2b9cf4a
640ecdb
 
 
 
 
 
 
2b9cf4a
640ecdb
 
2b9cf4a
640ecdb
 
2b9cf4a
 
 
640ecdb
 
2b9cf4a
 
 
 
 
 
 
 
 
 
 
 
 
640ecdb
 
2b9cf4a
 
 
 
 
 
 
 
 
 
 
640ecdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2b9cf4a
640ecdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2b9cf4a
640ecdb
 
2b9cf4a
640ecdb
2b9cf4a
 
 
640ecdb
2b9cf4a
640ecdb
2b9cf4a
 
 
 
 
640ecdb
 
 
 
 
2b9cf4a
640ecdb
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# dispatchAI SDK

**Small. Mobile. Free. UAE-built.**

`pip install dispatchai` β€” Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.

## Quick Start

```bash
pip install dispatchai[gguf]
```

### Chat with a model

```python
from dispatchai import load_model

model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
response = model.chat("What is the capital of France?")
print(response)
# β†’ "The capital of France is Paris."
```

## 🌐 Inference API

Use dispatchAI models via REST API (OpenAI-compatible):

```python
import openai

client = openai.OpenAI(
    base_url="https://api.dispatchai.ai/v1",
    api_key="da-demo-key-0001"
)

response = client.chat.completions.create(
    model="dispatchAI/SmolLM2-135M-Instruct-mobile",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)
# β†’ "The capital of France is Paris."
```

**Pricing:** $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)

**Endpoint:** `https://api.dispatchai.ai/v1`

**Available Models:**
- dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
- dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
- dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)

## Local Inference

### Find the best model for your phone

```python
from dispatchai import recommend

rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
```

### List all models

```python
from dispatchai import list_models

for m in list_models(task="chat"):
    print(f"  {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")
```

### Estimate latency

```python
from dispatchai import estimate_latency

lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
```

### Calculate cost savings

```python
from dispatchai import calculate_cost

result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")
```

## Installation Options

```bash
pip install dispatchai                    # Core (model catalog, recommendations)
pip install dispatchai[torch]             # + transformers/torch backend
pip install dispatchai[gguf]              # + llama.cpp GGUF backend
pip install dispatchai[full]              # + everything
```

## Verified Models (June 2026)

- βœ… 31 models fully working (0 broken, 0 partial)
- πŸ“± 24 models phone-verified on Snapdragon 865
- All have correct chat formats documented

## Top 3 Models

| Model | Size | Phone Speed | Use Case |
|-------|------|-------------|----------|
| SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones |
| Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile |
| Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB |

## About

Dispatch AI (FZE) β€” Sharjah Free Zone, UAE. License No. 10818.

🌐 [dispatchai.ai](https://www.dispatchai.ai) | πŸ€— [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | API: [api.dispatchai.ai](https://api.dispatchai.ai)

*I think, therefore I ship.*