hmunachii commited on
Commit
08dd57b
Β·
verified Β·
1 Parent(s): b3311a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -30
README.md CHANGED
@@ -1,13 +1,5 @@
1
- ---
2
- title: Cactus-Compute
3
- sdk: static
4
- pinned: true
5
- ---
6
-
7
  # Cactus
8
 
9
- <img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;">
10
-
11
  [![Docs][docs-shield]][docs-url]
12
  [![Website][website-shield]][website-url]
13
  [![GitHub][github-shield]][github-url]
@@ -15,7 +7,13 @@ pinned: true
15
  [![Reddit][reddit-shield]][reddit-url]
16
  [![Blog][blog-shield]][blog-url]
17
 
18
- A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
 
 
 
 
 
 
19
 
20
  ```
21
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -31,7 +29,7 @@ A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
31
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Custom attention, KV-cache quant, chunked prefill
32
  ```
33
 
34
- ## Quick Demo
35
 
36
  - Step 1: `brew install cactus-compute/cactus/cactus`
37
  - Step 2: `cactus transcribe` or `cactus run`
@@ -39,11 +37,12 @@ A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
39
  ## Cactus Engine
40
 
41
  ```cpp
42
- #include cactus.h
43
 
44
  cactus_model_t model = cactus_init(
45
  "path/to/weight/folder",
46
  "path to txt or dir of txts for auto-rag",
 
47
  );
48
 
49
  const char* messages = R"([
@@ -91,7 +90,7 @@ Example response from Gemma3-270m
91
  ## Cactus Graph
92
 
93
  ```cpp
94
- #include cactus.h
95
 
96
  CactusGraph graph;
97
  auto a = graph.input({2, 3}, Precision::FP16);
@@ -117,8 +116,8 @@ graph.hard_reset();
117
 
118
  | Reference | Language | Description |
119
  |-----------|----------|-------------|
120
- | [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
121
- | [Graph API](cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
122
  | [Python SDK](/python/) | Python | Mac, Linux |
123
  | [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
124
  | [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
@@ -126,7 +125,9 @@ graph.hard_reset();
126
  | [Rust SDK](/rust/) | Rust | Mac, Linux |
127
  | [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
128
 
129
- ## Benchmarks
 
 
130
 
131
  - All weights INT4 quantised
132
  - LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
@@ -134,7 +135,7 @@ graph.hard_reset();
134
  - Parakeet: 20s audio input, values are latency / decode tps
135
  - Missing latency = no NPU support yet
136
 
137
- | Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM |
138
  |--------|----------|------------|---------------|-----|
139
  | Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
140
  | iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
@@ -153,20 +154,20 @@ graph.hard_reset();
153
 
154
  | Model | Params | End2End ms | Latency ms | Decode toks/sec | NPU | RTF | WER |
155
  |-------|--------|------------|------------|------------|-----|-----|-----|
156
- | UsefulSensors/moonshine-base | 61M | 361.35 | 182 | 262 | yes | 0.0180 | 0.1395 |
157
- | openai/whisper-tiny | 39M | 232.03 | 137.38 | 581 | yes | 0.0116 | 0.1860 |
158
- | openai/whisper-base | 74M | 329.37 | 178.65 | 358 | yes | 0.0164 | 0.1628 |
159
- | openai/whisper-small | 244M | 856.79 | 332.63 | 108 | yes | 0.0428 | 0.0930 |
160
- | openai/whisper-medium | 769M | 2085.87 | 923.33 | 49 | yes | 0.1041 | 0.0930 |
161
- | nvidia/parakeet-ctc-0.6b | 600M | 201.77 | 201.44 | 5214285 | yes | 0.0101 | 0.0930 |
162
- | nvidia/parakeet-tdt-0.6b-v3 | 600M | 718.91 | 718.82 | 3583333 | no | 0.0359 | 0.0465 |
163
- | nvidia/parakeet-ctc-1.1b | 1.1B | 279.03 | 278.92 | 4562500 | yes | 0.0139 | 0.1628 |
164
  | snakers4/silero-vad | - | - | - | - | - | - | - |
165
 
166
  ## Supported LLMs
167
 
168
  - Gemma weights are often **gated** on HuggingFace, needs tokens
169
- - Run `hf auth login` and input your huggingface token
170
 
171
  | Model | Features |
172
  |-------|----------|
@@ -188,6 +189,7 @@ graph.hard_reset();
188
  | LiquidAI/LFM2-2.6B | completion, tools, embed |
189
  | LiquidAI/LFM2-VL-450M | vision, txt & img embed, Apple NPU |
190
  | LiquidAI/LFM2.5-VL-1.6B | vision, txt & img embed, Apple NPU |
 
191
  | nomic-ai/nomic-embed-text-v2-moe | embed |
192
 
193
  ## Roadmap
@@ -202,11 +204,8 @@ graph.hard_reset();
202
  | Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
203
  | Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
204
  | Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
205
- | May 2026 | Coming | Kernel→C++, Graph/Engine→Rust, Mac GPU & VR |
206
  | Jun 2026 | Coming | Torch/JAX model transpilers |
207
- | Jul 2026 | Coming | Wearables optimisations, Cactus@ICML |
208
- | Aug 2026 | Coming | Orchestration |
209
- | Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners |
210
 
211
  ## Using this repo
212
 
@@ -278,7 +277,7 @@ graph.hard_reset();
278
  2. [UCLA's BruinAI](https://bruinai.org/)
279
  3. [Char (YC S25)](https://char.com/)
280
  4. [Yale's AI Society](https://www.yale-ai.org/team)
281
- 5. [National Unoversity of Singapore's AI Society](https://www.nusaisociety.org/)
282
  6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
283
  7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
284
  8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)
 
 
 
 
 
 
 
1
  # Cactus
2
 
 
 
3
  [![Docs][docs-shield]][docs-url]
4
  [![Website][website-shield]][website-url]
5
  [![GitHub][github-shield]][github-url]
 
7
  [![Reddit][reddit-shield]][reddit-url]
8
  [![Blog][blog-shield]][blog-url]
9
 
10
+ A low-latency AI engine for mobile devices & wearables. Main features:
11
+
12
+ - **Fast:** fastest inference on ARM CPU
13
+ - **Low RAM:** zero-copy memory mapping ensures 10x lower RAM use than other engines
14
+ - **Multimodal:** one SDK for speech, vision, and language models
15
+ - **Cloud fallback:** automatically route requests to cloud models if needed
16
+ - **Energy-efficient:** NPU-accelerated prefill
17
 
18
  ```
19
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
29
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Custom attention, KV-cache quant, chunked prefill
30
  ```
31
 
32
+ ## Quick Demo (Mac)
33
 
34
  - Step 1: `brew install cactus-compute/cactus/cactus`
35
  - Step 2: `cactus transcribe` or `cactus run`
 
37
  ## Cactus Engine
38
 
39
  ```cpp
40
+ #include "cactus.h"
41
 
42
  cactus_model_t model = cactus_init(
43
  "path/to/weight/folder",
44
  "path to txt or dir of txts for auto-rag",
45
+ false
46
  );
47
 
48
  const char* messages = R"([
 
90
  ## Cactus Graph
91
 
92
  ```cpp
93
+ #include "cactus.h"
94
 
95
  CactusGraph graph;
96
  auto a = graph.input({2, 3}, Precision::FP16);
 
116
 
117
  | Reference | Language | Description |
118
  |-----------|----------|-------------|
119
+ | [Engine API](docs/cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
120
+ | [Graph API](docs/cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
121
  | [Python SDK](/python/) | Python | Mac, Linux |
122
  | [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
123
  | [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
 
125
  | [Rust SDK](/rust/) | Rust | Mac, Linux |
126
  | [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
127
 
128
+ > **Model weights:** Pre-converted weights for all supported models at [huggingface.co/Cactus-Compute](https://huggingface.co/Cactus-Compute).
129
+
130
+ ## Benchmarks (CPU-only, no GPU)
131
 
132
  - All weights INT4 quantised
133
  - LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
 
135
  - Parakeet: 20s audio input, values are latency / decode tps
136
  - Missing latency = no NPU support yet
137
 
138
+ | Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | VL RAM Usage |
139
  |--------|----------|------------|---------------|-----|
140
  | Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
141
  | iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
 
154
 
155
  | Model | Params | End2End ms | Latency ms | Decode toks/sec | NPU | RTF | WER |
156
  |-------|--------|------------|------------|------------|-----|-----|-----|
157
+ | UsefulSensors/moonshine-base | 61M | 361 | 182 | 262 | yes | 0.0180 | 0.1395 |
158
+ | openai/whisper-tiny | 39M | 232 | 137 | 581 | yes | 0.0116 | 0.1860 |
159
+ | openai/whisper-base | 74M | 329 | 178 | 358 | yes | 0.0164 | 0.1628 |
160
+ | openai/whisper-small | 244M | 856 | 332 | 108 | yes | 0.0428 | 0.0930 |
161
+ | openai/whisper-medium | 769M | 2085 | 923 | 49 | yes | 0.1041 | 0.0930 |
162
+ | nvidia/parakeet-ctc-0.6b | 600M | 201 | 201 | 5214285 | yes | 0.0101 | 0.0930 |
163
+ | nvidia/parakeet-tdt-0.6b-v3 | 600M | 718 | 718 | 3583333 | yes | 0.0359 | 0.0465 |
164
+ | nvidia/parakeet-ctc-1.1b | 1.1B | 279 | 278 | 4562500 | yes | 0.0139 | 0.1628 |
165
  | snakers4/silero-vad | - | - | - | - | - | - | - |
166
 
167
  ## Supported LLMs
168
 
169
  - Gemma weights are often **gated** on HuggingFace, needs tokens
170
+ - Run `huggingface-cli login` and input your huggingface token
171
 
172
  | Model | Features |
173
  |-------|----------|
 
189
  | LiquidAI/LFM2-2.6B | completion, tools, embed |
190
  | LiquidAI/LFM2-VL-450M | vision, txt & img embed, Apple NPU |
191
  | LiquidAI/LFM2.5-VL-1.6B | vision, txt & img embed, Apple NPU |
192
+ | tencent/Youtu-LLM-2B | completion, tools, embed |
193
  | nomic-ai/nomic-embed-text-v2-moe | embed |
194
 
195
  ## Roadmap
 
204
  | Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
205
  | Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
206
  | Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
207
+ | May 2026 | Coming | Wearables & custom chips optimisations |
208
  | Jun 2026 | Coming | Torch/JAX model transpilers |
 
 
 
209
 
210
  ## Using this repo
211
 
 
277
  2. [UCLA's BruinAI](https://bruinai.org/)
278
  3. [Char (YC S25)](https://char.com/)
279
  4. [Yale's AI Society](https://www.yale-ai.org/team)
280
+ 5. [National University of Singapore's AI Society](https://www.nusaisociety.org/)
281
  6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
282
  7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
283
  8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)