Make some quick consistency fixes to model card
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ license: llama3
|
|
9 |
---
|
10 |
<a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
|
11 |
|
12 |
-
# Llama-3 70B Gradient
|
13 |
|
14 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message contact@gradient.ai.
|
15 |
|
@@ -40,14 +40,14 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
40 |
| Initialize From | 65K | 262K |
|
41 |
|-------------------------|----------------------|------------|
|
42 |
| Sequence Length 2^N | 16 | 18 |
|
43 |
-
| RoPE
|
44 |
| Batch Size | 1 | 1 |
|
45 |
| Gradient Accumulation Steps | 1 | 1 |
|
46 |
| Steps | 20 | 25 |
|
47 |
| Total Tokens | 83,886,080 | 104,857,600|
|
48 |
-
| Learning
|
49 |
| # GPUs | 512 | 512 |
|
50 |
-
| Ring
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S|
|
52 |
| Minutes to Train (Wall) | 100 | 170 |
|
53 |
|
|
|
9 |
---
|
10 |
<a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
|
11 |
|
12 |
+
# Llama-3 70B Instruct Gradient 262K
|
13 |
|
14 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message contact@gradient.ai.
|
15 |
|
|
|
40 |
| Initialize From | 65K | 262K |
|
41 |
|-------------------------|----------------------|------------|
|
42 |
| Sequence Length 2^N | 16 | 18 |
|
43 |
+
| RoPE Theta | 15,296,098 | 207,112,184|
|
44 |
| Batch Size | 1 | 1 |
|
45 |
| Gradient Accumulation Steps | 1 | 1 |
|
46 |
| Steps | 20 | 25 |
|
47 |
| Total Tokens | 83,886,080 | 104,857,600|
|
48 |
+
| Learning Rate | 0.00002 | 0.00002 |
|
49 |
| # GPUs | 512 | 512 |
|
50 |
+
| Ring Parallelism | 64 | 16 |
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S|
|
52 |
| Minutes to Train (Wall) | 100 | 170 |
|
53 |
|