mustafaaljadery
/

gemma-2B-10M

Inference Endpoints

Model card Files Files and versions Community

mustafaaljadery commited on May 9

Commit

3861cc6

•

1 Parent(s): 807fe15

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -5,14 +5,14 @@ license: mit
 Gemma 2B with recurrent local attention with context length of up to 10M. Our implemenation uses **<32GB** of memory!
-![Graphic of our implementation context](./images/graphic.png)
 **Features:**
 - 10M sequence length on Gemma 2B.
-- Runs on less then 32GB of memory.
-- Native inference on Apple Silicon using MLX.
-- Highly performing retrieval - needle in hay stack.
 ## Quick Start

 Gemma 2B with recurrent local attention with context length of up to 10M. Our implemenation uses **<32GB** of memory!
+![Graphic of our implementation context](./graphic.png)
 **Features:**
 - 10M sequence length on Gemma 2B.
+- Runs on less than 32GB of memory.
+- Native inference optimized for cuda.
+- Recurrent local attention for O(N) memory.
 ## Quick Start