gradientai
/

Llama-3-8B-Instruct-262k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

michaelfeil commited on Apr 25

Commit

990790f

•

1 Parent(s): bdf8c7d

Update README.md

Files changed (1) hide show

README.md +11 -12

README.md CHANGED Viewed

@@ -30,18 +30,17 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 **Progressive Training Details:**
-| Parameter                   | 65K        | 262K       |
-|-----------------------------|------------|------------|
-| Initialize From             | LLaMA-3 8B | 65K        |
-| Sequence Length             | 2^16       | 2^18       |
-| RoPE theta                  | 15.3 M     | 207.1 M    |
-| Batch Size                  | 1          | 1          |
-| Gradient Accumulation Steps | 32         | 16         |
-| Steps                       | 30         | 24         |
-| Total Tokens                | 63 M       | 101 M      |
-| Learning Rate               | 2.00E-05   | 2.00E-05   |
-| # GPUs                      | 32         | 32         |
-| GPU Type                    | NVIDIA L40S| NVIDIA L40S|
 ## The Gradient AI Team

 **Progressive Training Details:**
+| Parameter                   | 65K            | 262K       |
+|-----------------------------|----------------|------------|
+| Initialize From             | LLaMA-3-8B-Inst| 65K        |
+| Sequence Length             | 2^16           | 2^18       |
+| RoPE theta                  | 15.3 M         | 207.1 M    |
+| Batch Size (Tokens / Step)  | 2M             | 4M         |
+| Steps                       | 30             | 24         |
+| Total Tokens                | 63 M           | 101 M      |
+| Learning Rate               | 2.00E-05       | 2.00E-05   |
+| # GPUs                      | 32             | 32         |
+| GPU Type                    | NVIDIA L40S    | NVIDIA L40S|
 ## The Gradient AI Team