metadata
library_name: transformers
license: apache-2.0
datasets:
- Vikhrmodels/Flan_translated_300k
- d0rj/OpenOrca-ru
language:
- ru
- en
Model Card for ru-rope-t5-small-instruct
The Russian Rotary Position Embedding T5 model of small version after instruct tuning
Model Details
The model was trained in a Russian corpus with a mix of English using the Mixture-Of-Denoisers pre-training method by UL2 on 1024 length sequences. Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.
- Model type: RoPE T5
- Language(s) (NLP): Russian, English
Uses
Finetuning for downstream tasks
Bias, Risks, and Limitations
Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size
Training Details
Training Data
A corpus of Russian texts from Vikhr filtered by FRED-T5-1.7B perplexy. Instructions are translated English set
Training Procedure
Using AdamWScale instead of Adafactor for stable learning without loss explosions