Update README.md
Browse files
README.md
CHANGED
@@ -106,7 +106,15 @@ The Pile was deduplicated before being used to train Pile-T5.
|
|
106 |
#### Training procedure
|
107 |
|
108 |
Pile-T5 was trained with a batch size of approximately 1M tokens
|
109 |
-
(2048 sequences of 512 tokens each), for a total of 2,000,000 steps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
### Evaluations
|
112 |
|
|
|
106 |
#### Training procedure
|
107 |
|
108 |
Pile-T5 was trained with a batch size of approximately 1M tokens
|
109 |
+
(2048 sequences of 512 tokens each), for a total of 2,000,000 steps. Pile-T5 was trained
|
110 |
+
with the span-corruption objective.
|
111 |
+
|
112 |
+
#### Training checkpoints
|
113 |
+
|
114 |
+
Intermediate checkpoints for Pile-T5 are accessible within this repository.
|
115 |
+
There are in total 200 checkpoints that are spaced 10,000 steps. For T5x-native
|
116 |
+
checkpoints that can be used for finetuning with the T5x library, refer to [here](https://huggingface.co/lintang/pile-t5-base-t5x/tree/main)
|
117 |
+
|
118 |
|
119 |
### Evaluations
|
120 |
|