ai-forever
/

FRED-T5-large

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sberbank-ai commited on Feb 28, 2023

Commit

0121fdb

•

1 Parent(s): fea3169

Update README.md

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
 ---
 license: apache-2.0
 ---

 ---
+language:
+- ru
 license: apache-2.0
 ---
+# FRED-T5 large 800M (Full-scale Russian Enhanced Denoisers T5)
+Architecture based on T5.
+It has 24 layers and 1024 hidden size. More details in config.json.
+The model trained on a mixture of 7 denoisers like UL2 with several differences (https://arxiv.org/abs/2205.05131).
+It was trained on Russian language corpus (300GB).   The dataset is the same as for ruT5 models.
+Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '\<LM\>', '\<SC1>',.. '\<SC6>'
+First half of the time model trained on the small part of all dataset (1%,3GB) and without prefixes in each task.
+For RSG, we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
+RSG submit here https://russiansuperglue.com/login/submit_info/1936
+Total training time was around 35 days on 160 V100 GPUs.
+We'll  release  checkpoint  to the public soon.