sberbank-ai commited on
Commit
f158b54
1 Parent(s): 2daa064

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -14,15 +14,16 @@ Model trained on a mixture of 7 denoisers like UL2 with several differences (htt
14
 
15
  It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
16
 
17
- Bbpe tokenizer.
18
 
19
  First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
20
 
21
  For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
 
22
 
23
  Total training time was around 45 days on 112 A100 GPUs.
24
 
25
- Training loss:
26
  ![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
27
 
28
  We continue to experiment...
 
14
 
15
  It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
16
 
17
+ Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '<LM>','<SC1>'...'<SC6>'
18
 
19
  First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
20
 
21
  For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
22
+ RSG submit here https://russiansuperglue.com/login/submit_info/1936
23
 
24
  Total training time was around 45 days on 112 A100 GPUs.
25
 
26
+ Training loss
27
  ![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
28
 
29
  We continue to experiment...