Shavrina commited on
Commit
9a40628
1 Parent(s): ea325ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -4
README.md CHANGED
@@ -1,12 +1,31 @@
1
- ## RusEnQA
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- QA for Russian and Englisha based on the [rugpt3xl](https://huggingface.co/sberbank-ai/rugpt3xl) model
4
 
 
5
 
 
 
 
 
6
 
7
  ### About ruGPT-3 XL model
8
- Model was trained with 512 sequence length using Deepspeed and Megatron code by SberDevices team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
9
- Note! Model has sparse attention blocks.
10
 
11
  Total training time was around 10 days on 256 GPUs.
12
  Final perplexity on test set is 12.05. Model parameters: 1.3B.
 
1
+ ---
2
+ language:
3
+ - ru
4
+ - en
5
+ pipeline_tag: text-to-text generation
6
+ tags:
7
+ - PyTorch
8
+ - Transformers
9
+ - gpt2
10
+ - squad
11
+ - lm-head
12
+ - casual-lm
13
+ thumbnail: "https://github.com/RussianNLP/RusEnQA"
14
+
15
+ ---
16
 
17
+ ## RusEnQA
18
 
19
+ QA for Russian and English based on the [rugpt3xl](https://huggingface.co/sberbank-ai/rugpt3xl) model
20
 
21
+ ####Funing-tuning format:
22
+ ```
23
+ "<s>paragraph: "+eng_context+"\nlang: rus\nquestion: "+rus_question+' answer: '+ rus_answer+"</s>"
24
+ ```
25
 
26
  ### About ruGPT-3 XL model
27
+ Model was trained with 512 sequence length using [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) code by [SberDevices](https://sberdevices.ru/) team, on 80B tokens dataset for 4 epochs. After that model was finetuned 1 epoch with sequence length 2048.
28
+ *Note! Model has sparse attention blocks.*
29
 
30
  Total training time was around 10 days on 256 GPUs.
31
  Final perplexity on test set is 12.05. Model parameters: 1.3B.