xai-org
/

grok-1

Text Generation

Grok

grok-1

Model card Files Files and versions Community

Citaman commited on Mar 17

Commit

076832a

•

1 Parent(s): e35b056

Update README.md - Add model Details

Browse files

Files changed (1) hide show

README.md +56 -2

README.md CHANGED Viewed

@@ -2,8 +2,62 @@
 license: apache-2.0
 ---
 # Grok-1
-This repository contains the weights of the Grok-1 open-weights model.
 Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
@@ -18,4 +72,4 @@ You should be seeing output from the language model.
 Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
-p.s. we're hiring: https://x.ai/career

 license: apache-2.0
 ---
 # Grok-1
+_This repository contains the weights of the Grok-1 open-weights model._
+                         ╔══════════════════════════╗
+                         ║                 _______  ║
+                         ║            /\   |_   _|  ║
+                         ║  __  __   /  \    | |    ║
+                         ║  \ \/ /  / /\ \   | |    ║
+                         ║   >  <  / ____ \ _| |_   ║
+                         ║  /_/\_\/_/    \_\_____|  ║
+                         ║                          ║
+                         ║  Understand the Universe ║
+                         ║      [https://x.ai]      ║
+                         ╚════════════╗╔════════════╝
+                             ╔════════╝╚═════════╗
+                             ║ xAI Grok-1 (314B) ║
+                             ╚════════╗╔═════════╝
+                ╔═════════════════════╝╚═════════════════════╗
+                ║ 314B parameter Mixture of Experts model    ║
+                ║ - Base model (not finetuned)               ║
+                ║ - 8 experts (2 active)                     ║
+                ║ - 86B active parameters                    ║
+                ║ - Apache 2.0 license                       ║
+                ║ - Code: https://github.com/xai-org/grok-1  ║
+                ║ - Happy coding!                            ║
+                ╚════════════════════════════════════════════╝
+## Model Configuration Details
+**Vocabulary Size**: 131,072
+**Special Tokens**:
+- Pad Token: 0
+- End of Sequence Token: 2
+**Sequence Length**: 8192
+### **Model Architecture**: MoE
+- **Embedding Size**: 6,144
+- **Layers**: 64
+- **Experts**: 8
+- **Selected Experts**: 2
+- **Widening Factor**: 8
+- **Key Size**: 128
+- **Query Heads**: 48
+- **Key Value Heads**: 8
+- **Activation Sharding**: Data-wise, Model-wise
+### **Inference Configuration**:
+- Batch Size per Device: 0.125
+- Tokenizer: `./tokenizer.model`
+- Local Mesh: 1x8
+- Between Hosts: 1x1
+## Inference Details
 Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
 Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
+**p.s. we're hiring: https://x.ai/career**