File size: 1,389 Bytes

8cc9c70
 
d5ea6a1
 
 
 
 
 
 
8cc9c70
 
d5ea6a1
8cc9c70
d5ea6a1
8cc9c70
 
 
d5ea6a1
 
 
 
8cc9c70
 
 
d5ea6a1
8cc9c70
 
 
d5ea6a1
8cc9c70
 
 
 
 
d5ea6a1
8cc9c70
 
 
d5ea6a1
8cc9c70
 
 
e7148b2
8cc9c70
 
 
d5ea6a1

---
library_name: transformers
license: apache-2.0
datasets:
- Vikhrmodels/Flan_translated_300k
- d0rj/OpenOrca-ru
language:
- ru
- en
---

# Model Card for ru-rope-t5-small-instruct

The Russian Rotary Position Embedding T5 model of small version after instruct tuning

## Model Details

The model was trained in a Russian corpus with a mix of English using the [Mixture-Of-Denoisers](https://arxiv.org/abs/2205.05131v1) pre-training method by [UL2](https://huggingface.co/google/ul2) on 1024 length sequences. 
Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.
- **Model type:** [RoPE T5](https://huggingface.co/melmoth/ru-rope-t5-small-instruct/blob/main/t5.py)
- **Language(s) (NLP):** Russian, English

## Uses

Finetuning for downstream tasks

## Bias, Risks, and Limitations

Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size

## Training Details

### Training Data

A corpus of Russian texts from [Vikhr](https://huggingface.co/Vikhrmodels) filtered by [FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) perplexy. Instructions are translated English set 

### Training Procedure

Using AdamWScale instead of Adafactor for stable learning without loss explosions

#### Metrics

![rsg](rsg_results.png)

## Model Card Contact

[@TheMelmoth](https://t.me/TheMelmoth)