Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,6 @@ paricularly in the Science Fiction and Fantasy genres.
|
|
17 |
The following hyperparameters were used
|
18 |
|Batch Size|Epochs|Context length|Learning rate|Scheduler|Weight decay|Warmup ratio|
|
19 |
|----------|------|--------------|-------------|---------|------------|------------|
|
20 |
-
|
|
21 |
|
22 |
The model reached a training loss of 2.008 and took approximately 8 hours on 8x A100 80 GB GPUs.
|
|
|
17 |
The following hyperparameters were used
|
18 |
|Batch Size|Epochs|Context length|Learning rate|Scheduler|Weight decay|Warmup ratio|
|
19 |
|----------|------|--------------|-------------|---------|------------|------------|
|
20 |
+
| 128 | 3 | 8192 | 2e-5 | Cosine | 0. | 0.03 |
|
21 |
|
22 |
The model reached a training loss of 2.008 and took approximately 8 hours on 8x A100 80 GB GPUs.
|