gradientai
/

Llama-3-70B-Instruct-Gradient-262k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

leo-pekelis-gradient commited on May 3

Commit

b8050a4

•

1 Parent(s): 42476c1

Update README.md

Files changed (1) hide show

README.md +14 -13

README.md CHANGED Viewed

@@ -37,19 +37,20 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 **Progressive Training Details:**
-| Initialize From         | 65K                  | 262K       |
-|-------------------------|----------------------|------------|
-| Sequence Length 2^N     | 16                   | 18         |
-| RoPE Theta              | 15,296,098           | 207,112,184|
-| Batch Size              | 1                    | 1          |
-| Gradient Accumulation Steps | 1               | 1          |
-| Steps                   | 20                   | 25         |
-| Total Tokens            | 83,886,080           | 104,857,600|
-| Learning Rate           | 0.00002              | 0.00002    |
-| # GPUs                  | 512                  | 512        |
-| Ring Parallelism        | 64                   | 16         |
-| GPU Type                | NVIDIA L40S          | NVIDIA L40S|
-| Minutes to Train (Wall) | 100                  | 170        |
 **Evaluation Details:**

 **Progressive Training Details:**
+|           | 65K             | 262K            |
+|--------------------------|-----------------|-----------------|
+| Initialize From          | Llama-3-70B-Instruct             | 65K            |
+| Sequence Length 2^N      | 16              | 18              |
+| RoPE theta               | 15,296,098      | 207,112,184     |
+| Batch Size               | 1               | 1               |
+| Gradient Accumulation Steps | 1           | 1               |
+| Steps                    | 20              | 25              |
+| Total Tokens             | 83,886,080      | 104,857,600     |
+| Learning rate            | 0.00002         | 0.00002         |
+| # GPUs                   | 512             | 512             |
+| Ring parallelism         | 64              | 16              |
+| GPU Type                 | NVIDIA L40S     | NVIDIA L40S     |
+| Minutes to Train (Wall)  | 100             | 170             |
 **Evaluation Details:**