gradientai
/

Llama-3-70B-Instruct-Gradient-262k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

leo-pekelis-gradient commited on May 3

Commit

7484062

•

1 Parent(s): ea25e66

Update README.md

Files changed (1) hide show

README.md +13 -15

README.md CHANGED Viewed

@@ -37,21 +37,19 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 **Progressive Training Details:**
-|                        | 65K       | 262K      | 524K       |
-|------------------------|-----------|-----------|------------|
-| Initialize From        | Llama-3-70B-Instruct | 65K   | 262K  |
-| Sequence Length 2^N    | 16        | 18        | 19         |
-| RoPE theta             | 15296098  | 207112184 | 1062356830 |
-| Batch Size             | 1         | 1         | 1          |
-| Gradient Accumulation Steps | 1   | 1         | 2          |
-| Steps                  | 20        | 25        | 25         |
-| Total Tokens           | 83886080  | 104857600 | 209715200  |
-| Learning rate          | 2.00E-05  | 2.00E-05  | 2.00E-05   |
-| # GPUs                 | 512       | 512       | 512        |
-| Ring parallelism       | 64        | 16        | 8          |
-| GPU Type               | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
-| Minutes to Train (Wall)| 100       | 170       | 284        |
 **Evaluation Details:**

 **Progressive Training Details:**
+| Initialize From         | 65K                  | 262K       |
+|-------------------------|----------------------|------------|
+| Sequence Length 2^N     | 16                   | 18         |
+| RoPE theta              | 15,296,098           | 207,112,184|
+| Batch Size              | 1                    | 1          |
+| Gradient Accumulation Steps | 1               | 1          |
+| Steps                   | 20                   | 25         |
+| Total Tokens            | 83,886,080           | 104,857,600|
+| Learning rate           | 0.00002              | 0.00002    |
+| # GPUs                  | 512                  | 512        |
+| Ring parallelism        | 64                   | 16         |
+| GPU Type                | NVIDIA L40S          | NVIDIA L40S|
+| Minutes to Train (Wall) | 100                  | 170        |
 **Evaluation Details:**