Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
19 |
|a | accumulation steps | 2 |
|
20 |
|d | dropout | 0.20 |
|
21 |
|r | learning rate | 3e-5 |
|
22 |
-
|y |
|
23 |
|iter | number of iterations after pretraing | 757,000 |
|
24 |
|
25 |
## Jam-CGPT 110 million parameters model
|
@@ -33,7 +33,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
33 |
|a | accumulation steps | 4 |
|
34 |
|d | dropout | 0.20 |
|
35 |
|r | learning rate | 3e-5 |
|
36 |
-
|y |
|
37 |
|iter | number of iterations after pretraing | 762,000 |
|
38 |
|
39 |
|
@@ -49,7 +49,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
49 |
|d | dropout | 0.20 |
|
50 |
|r | learning rate | 3e-5 |
|
51 |
|y | weight decay | 1e-5 |
|
52 |
-
|iter |
|
53 |
|
54 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
55 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|
|
|
19 |
|a | accumulation steps | 2 |
|
20 |
|d | dropout | 0.20 |
|
21 |
|r | learning rate | 3e-5 |
|
22 |
+
|y | iterations | 1e-5 |
|
23 |
|iter | number of iterations after pretraing | 757,000 |
|
24 |
|
25 |
## Jam-CGPT 110 million parameters model
|
|
|
33 |
|a | accumulation steps | 4 |
|
34 |
|d | dropout | 0.20 |
|
35 |
|r | learning rate | 3e-5 |
|
36 |
+
|y | iterations | 1e-5 |
|
37 |
|iter | number of iterations after pretraing | 762,000 |
|
38 |
|
39 |
|
|
|
49 |
|d | dropout | 0.20 |
|
50 |
|r | learning rate | 3e-5 |
|
51 |
|y | weight decay | 1e-5 |
|
52 |
+
|iter | iterations | 272,000 |
|
53 |
|
54 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
55 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|