Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +53 -0
config.json +1 -0
model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+license: mit
+tags:
+- sparse-first
+- helix
+- mongoose
+language:
+- en
+---
+# Sparse-First Trained Transformer (dim=2048)
+A byte-level transformer trained with sparse-first training — the framework where every stage of the pipeline operates on the active parameter subset, not the full model.
+## Architecture
+- **dim**: 2048
+- **layers**: 4
+- **heads**: 64 (GQA, 32 KV heads)
+- **FFN**: 4096
+- **vocab**: 256 (byte-level)
+- **params**: 152M
+## Training
+Trained with the [Helix DNA optimizer](https://github.com/open-ai-org/helix) on RTX 5090:
+- **74 steps/s** at dim=2048
+- gate↔up (G≡C, 3 H-bonds), wq↔wo (A≡T), wk↔wv (A≡T) — 3 DNA pairs per layer
+- Conductor-driven sparsity: only hot rows get gradients, optimizer updates, and weight writeback
+- Immune system: automatic checkpoint at loss floors, revert on rebound
+## Multi-GPU
+On dual H100 SXM NVLink with Helix Dispatch (interleaved position parallelism):
+- **21.7 steps/s** at dim=4096 — 1.54x faster than PyTorch DDP
+## Usage
+```bash
+brew install open-ai-org/tap/ai
+ai pull open-ai-org/sparse-first-2048
+ai infer sparse-first-2048 "Hello"
+```
+## Paper
+[Sparse-First Training: A Biologically-Inspired Framework](https://github.com/open-ai-org/ai/blob/master/docs/sparse-first-training.md)
+## Framework
+- [mongoose](https://github.com/open-ai-org/mongoose) — GPU compute engine
+- [ai](https://github.com/open-ai-org/ai) — CLI
+- [helix](https://github.com/open-ai-org/helix) — DNA optimizer

config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"architectures":["LlamaForCausalLM"],"hidden_size":2048,"num_hidden_layers":4,"num_attention_heads":64,"num_key_value_heads":32,"intermediate_size":4096,"vocab_size":256,"max_position_embeddings":2048,"rope_theta":10000.0,"rms_norm_eps":1e-6,"hidden_act":"silu","tie_word_embeddings":true}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a483c37ff92d35ebedc3f4214d07aee9a597480fdc4426e631a6f069ef22cef
+size 606154881