sberbank-ai commited on
Commit
983aa94
•
1 Parent(s): ccf6bc4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -2,6 +2,7 @@
2
  tags:
3
  - RUDOLPH
4
  - text-image
 
5
  - decoder
6
  datasets:
7
  - sberquad
@@ -28,7 +29,7 @@ This is a fine-tuned version of the pre-trained [RuDOLPH 2.7B model](https://hug
28
  * Training Data Volume: ` 119 million text-image pairs, 60 million text paragraphs`
29
  * Fine-tuning Data Volume: ` 43 334 text question-answer pairs, 100 000 math tasks, 85 000 text-image pairs (for captioning, generation), 85 759 visual question-answer pairs, 140 000 image-text pairs for text recognition`
30
 
31
- The model was prepared as a baseline for FusionBrain Challenge 2.0 (as a part of AI Journey Contest 2022) and is a fine-tuned version of the pre-trained [RuDOLPH 2.7B model](https://huggingface.co/sberbank-ai/RuDOLPH-2.7B) using 6 tasks:
32
 
33
  * Text QA – [SberQUaD dataset](https://huggingface.co/datasets/sberquad).
34
  * Math QA – [DeepMind Mathematics Dataset](https://github.com/deepmind/mathematics_dataset).
@@ -55,7 +56,7 @@ RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
55
 
56
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
57
 
58
- ![rudolph27b_masks.png](https://s3.amazonaws.com/moonup/production/uploads/1663662426135-5f91b1208a61a359f44e1851.png)
59
 
60
  # Authors
61
 
 
2
  tags:
3
  - RUDOLPH
4
  - text-image
5
+ - image-text
6
  - decoder
7
  datasets:
8
  - sberquad
 
29
  * Training Data Volume: ` 119 million text-image pairs, 60 million text paragraphs`
30
  * Fine-tuning Data Volume: ` 43 334 text question-answer pairs, 100 000 math tasks, 85 000 text-image pairs (for captioning, generation), 85 759 visual question-answer pairs, 140 000 image-text pairs for text recognition`
31
 
32
+ The model was prepared as a baseline for FusionBrain Challenge 2.0 (as a part of AI Journey Contest 2022) and is a fine-tuned version of the pre-trained [RuDOLPH 2.7B model](https://huggingface.co/sberbank-ai/RUDOLPH-2.7B) using 6 tasks:
33
 
34
  * Text QA – [SberQUaD dataset](https://huggingface.co/datasets/sberquad).
35
  * Math QA – [DeepMind Mathematics Dataset](https://github.com/deepmind/mathematics_dataset).
 
56
 
57
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
58
 
59
+ <img src="https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/attention_mask_27b.png" height="20" border="2"/>
60
 
61
  # Authors
62