FRED-T5-1.7B / README.md
sberbank-ai
Update README.md
499e3bf
|
raw
history blame
527 Bytes
metadata
language:
  - ru

FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5)

Architecture based on T5.

It has 24 layers and 1536 hidden size.

Model was trained on a mixture of 7 denoisers like UL2 with several differences .

It trained on Russian language corpus (300GB). The dataset is the same as for ruT5 models.

Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%).

We continue to experiment...

We'll tell you more and release checkpoint to the public soon.