Upload folder using huggingface_hub
Browse files- README.md +53 -0
- config.json +1 -0
- model.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- sparse-first
|
| 5 |
+
- helix
|
| 6 |
+
- mongoose
|
| 7 |
+
language:
|
| 8 |
+
- en
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Sparse-First Trained Transformer (dim=2048)
|
| 12 |
+
|
| 13 |
+
A byte-level transformer trained with sparse-first training — the framework where every stage of the pipeline operates on the active parameter subset, not the full model.
|
| 14 |
+
|
| 15 |
+
## Architecture
|
| 16 |
+
|
| 17 |
+
- **dim**: 2048
|
| 18 |
+
- **layers**: 4
|
| 19 |
+
- **heads**: 64 (GQA, 32 KV heads)
|
| 20 |
+
- **FFN**: 4096
|
| 21 |
+
- **vocab**: 256 (byte-level)
|
| 22 |
+
- **params**: 152M
|
| 23 |
+
|
| 24 |
+
## Training
|
| 25 |
+
|
| 26 |
+
Trained with the [Helix DNA optimizer](https://github.com/open-ai-org/helix) on RTX 5090:
|
| 27 |
+
- **74 steps/s** at dim=2048
|
| 28 |
+
- gate↔up (G≡C, 3 H-bonds), wq↔wo (A≡T), wk↔wv (A≡T) — 3 DNA pairs per layer
|
| 29 |
+
- Conductor-driven sparsity: only hot rows get gradients, optimizer updates, and weight writeback
|
| 30 |
+
- Immune system: automatic checkpoint at loss floors, revert on rebound
|
| 31 |
+
|
| 32 |
+
## Multi-GPU
|
| 33 |
+
|
| 34 |
+
On dual H100 SXM NVLink with Helix Dispatch (interleaved position parallelism):
|
| 35 |
+
- **21.7 steps/s** at dim=4096 — 1.54x faster than PyTorch DDP
|
| 36 |
+
|
| 37 |
+
## Usage
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
brew install open-ai-org/tap/ai
|
| 41 |
+
ai pull open-ai-org/sparse-first-2048
|
| 42 |
+
ai infer sparse-first-2048 "Hello"
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
## Paper
|
| 46 |
+
|
| 47 |
+
[Sparse-First Training: A Biologically-Inspired Framework](https://github.com/open-ai-org/ai/blob/master/docs/sparse-first-training.md)
|
| 48 |
+
|
| 49 |
+
## Framework
|
| 50 |
+
|
| 51 |
+
- [mongoose](https://github.com/open-ai-org/mongoose) — GPU compute engine
|
| 52 |
+
- [ai](https://github.com/open-ai-org/ai) — CLI
|
| 53 |
+
- [helix](https://github.com/open-ai-org/helix) — DNA optimizer
|
config.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"architectures":["LlamaForCausalLM"],"hidden_size":2048,"num_hidden_layers":4,"num_attention_heads":64,"num_key_value_heads":32,"intermediate_size":4096,"vocab_size":256,"max_position_embeddings":2048,"rope_theta":10000.0,"rms_norm_eps":1e-6,"hidden_act":"silu","tie_word_embeddings":true}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7a483c37ff92d35ebedc3f4214d07aee9a597480fdc4426e631a6f069ef22cef
|
| 3 |
+
size 606154881
|