sberbank-ai
commited on
Commit
·
7b8f31a
1
Parent(s):
c5ba01c
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,8 @@ First half of the time model trained on the small part of all datasets (1%,3GB)
|
|
20 |
|
21 |
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
|
22 |
|
|
|
|
|
23 |
Training loss:
|
24 |
![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
|
25 |
|
|
|
20 |
|
21 |
For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
|
22 |
|
23 |
+
Total training time was around 45 days on 112 A100 GPUs.
|
24 |
+
|
25 |
Training loss:
|
26 |
![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
|
27 |
|