File size: 3,187 Bytes
640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb 2b9cf4a 640ecdb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | # dispatchAI SDK
**Small. Mobile. Free. UAE-built.**
`pip install dispatchai` β Run mobile-optimized LLMs on your phone, edge device, or laptop. 31 verified models, all tested on real Snapdragon hardware, all free.
## Quick Start
```bash
pip install dispatchai[gguf]
```
### Chat with a model
```python
from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
response = model.chat("What is the capital of France?")
print(response)
# β "The capital of France is Paris."
```
## π Inference API
Use dispatchAI models via REST API (OpenAI-compatible):
```python
import openai
client = openai.OpenAI(
base_url="https://api.dispatchai.ai/v1",
api_key="da-demo-key-0001"
)
response = client.chat.completions.create(
model="dispatchAI/SmolLM2-135M-Instruct-mobile",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
print(response.choices[0].message.content)
# β "The capital of France is Paris."
```
**Pricing:** $0.001/1K input tokens, $0.002/1K output tokens (10x cheaper than OpenAI)
**Endpoint:** `https://api.dispatchai.ai/v1`
**Available Models:**
- dispatchAI/SmolLM2-135M-Instruct-mobile (101MB, 46 t/s on phone)
- dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 (469MB, 23 t/s on phone)
- dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile (770MB, 5.4 t/s on phone)
## Local Inference
### Find the best model for your phone
```python
from dispatchai import recommend
rec = recommend(ram_mb=2048, task="chat")
print(f"Best model: {rec['recommended']['name']}")
```
### List all models
```python
from dispatchai import list_models
for m in list_models(task="chat"):
print(f" {m['name']}: {m['size_mb']}MB, {m['speed_tps']} t/s")
```
### Estimate latency
```python
from dispatchai import estimate_latency
lat = estimate_latency("1B", "Q4_K_M")
print(f"{lat['tokens_per_sec']} t/s on Snapdragon 865")
```
### Calculate cost savings
```python
from dispatchai import calculate_cost
result = calculate_cost(daily_queries=10000, cloud_cost_per_1k=0.50)
print(f"Annual savings: ${result['savings']}")
```
## Installation Options
```bash
pip install dispatchai # Core (model catalog, recommendations)
pip install dispatchai[torch] # + transformers/torch backend
pip install dispatchai[gguf] # + llama.cpp GGUF backend
pip install dispatchai[full] # + everything
```
## Verified Models (June 2026)
- β
31 models fully working (0 broken, 0 partial)
- π± 24 models phone-verified on Snapdragon 865
- All have correct chat formats documented
## Top 3 Models
| Model | Size | Phone Speed | Use Case |
|-------|------|-------------|----------|
| SmolLM2-135M | 101MB | 46.0 t/s | Ultra-fast, budget phones |
| Qwen2.5-0.5B-int4 | 469MB | 23.2 t/s | Best balance for mobile |
| Llama-3.2-1B-Q4 | 770MB | 5.4 t/s | Best quality under 1GB |
## About
Dispatch AI (FZE) β Sharjah Free Zone, UAE. License No. 10818.
π [dispatchai.ai](https://www.dispatchai.ai) | π€ [huggingface.co/dispatchAI](https://huggingface.co/dispatchAI) | API: [api.dispatchai.ai](https://api.dispatchai.ai)
*I think, therefore I ship.*
|