swizzley88 commited on
Commit
655d5a2
·
verified ·
1 Parent(s): 82da716

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +53 -0
  2. config.json +1 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - sparse-first
5
+ - helix
6
+ - mongoose
7
+ language:
8
+ - en
9
+ ---
10
+
11
+ # Sparse-First Trained Transformer (dim=2048)
12
+
13
+ A byte-level transformer trained with sparse-first training — the framework where every stage of the pipeline operates on the active parameter subset, not the full model.
14
+
15
+ ## Architecture
16
+
17
+ - **dim**: 2048
18
+ - **layers**: 4
19
+ - **heads**: 64 (GQA, 32 KV heads)
20
+ - **FFN**: 4096
21
+ - **vocab**: 256 (byte-level)
22
+ - **params**: 152M
23
+
24
+ ## Training
25
+
26
+ Trained with the [Helix DNA optimizer](https://github.com/open-ai-org/helix) on RTX 5090:
27
+ - **74 steps/s** at dim=2048
28
+ - gate↔up (G≡C, 3 H-bonds), wq↔wo (A≡T), wk↔wv (A≡T) — 3 DNA pairs per layer
29
+ - Conductor-driven sparsity: only hot rows get gradients, optimizer updates, and weight writeback
30
+ - Immune system: automatic checkpoint at loss floors, revert on rebound
31
+
32
+ ## Multi-GPU
33
+
34
+ On dual H100 SXM NVLink with Helix Dispatch (interleaved position parallelism):
35
+ - **21.7 steps/s** at dim=4096 — 1.54x faster than PyTorch DDP
36
+
37
+ ## Usage
38
+
39
+ ```bash
40
+ brew install open-ai-org/tap/ai
41
+ ai pull open-ai-org/sparse-first-2048
42
+ ai infer sparse-first-2048 "Hello"
43
+ ```
44
+
45
+ ## Paper
46
+
47
+ [Sparse-First Training: A Biologically-Inspired Framework](https://github.com/open-ai-org/ai/blob/master/docs/sparse-first-training.md)
48
+
49
+ ## Framework
50
+
51
+ - [mongoose](https://github.com/open-ai-org/mongoose) — GPU compute engine
52
+ - [ai](https://github.com/open-ai-org/ai) — CLI
53
+ - [helix](https://github.com/open-ai-org/helix) — DNA optimizer
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"architectures":["LlamaForCausalLM"],"hidden_size":2048,"num_hidden_layers":4,"num_attention_heads":64,"num_key_value_heads":32,"intermediate_size":4096,"vocab_size":256,"max_position_embeddings":2048,"rope_theta":10000.0,"rms_norm_eps":1e-6,"hidden_act":"silu","tie_word_embeddings":true}
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a483c37ff92d35ebedc3f4214d07aee9a597480fdc4426e631a6f069ef22cef
3
+ size 606154881