Transformers
Safetensors
Inference Endpoints
mustafaaljadery commited on
Commit
3861cc6
1 Parent(s): 807fe15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -5,14 +5,14 @@ license: mit
5
 
6
  Gemma 2B with recurrent local attention with context length of up to 10M. Our implemenation uses **<32GB** of memory!
7
 
8
- ![Graphic of our implementation context](./images/graphic.png)
9
 
10
  **Features:**
11
 
12
  - 10M sequence length on Gemma 2B.
13
- - Runs on less then 32GB of memory.
14
- - Native inference on Apple Silicon using MLX.
15
- - Highly performing retrieval - needle in hay stack.
16
 
17
  ## Quick Start
18
 
 
5
 
6
  Gemma 2B with recurrent local attention with context length of up to 10M. Our implemenation uses **<32GB** of memory!
7
 
8
+ ![Graphic of our implementation context](./graphic.png)
9
 
10
  **Features:**
11
 
12
  - 10M sequence length on Gemma 2B.
13
+ - Runs on less than 32GB of memory.
14
+ - Native inference optimized for cuda.
15
+ - Recurrent local attention for O(N) memory.
16
 
17
  ## Quick Start
18