sberbank-ai commited on
Commit
6b9d45c
β€’
1 Parent(s): 7cd35ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -34,14 +34,14 @@ The model was prepared as a baseline for FusionBrain Challenge 2.0 (as a part of
34
  * Math QA – [DeepMind Mathematics Dataset](https://github.com/deepmind/mathematics_dataset).
35
  * Image Captioning – [COCO dataset](https://cocodataset.org/#home) (with automated translation).
36
  * Image Generation – [COCO dataset](https://cocodataset.org/#home) (with automated translation).
37
- * VQA12-layer, 768-hidden, 12-heads, 110M parameters.[COCO dataset](https://cocodataset.org/#home) with prepared question set.
38
- * Text Recognition in the Wild – the dataset consisting of synthetic and real-world human-annotated data for text recognition task.
39
 
40
  # Details of architecture
41
 
42
  ### Parameters
43
 
44
- ![rudolph27b_masks.png](https://s3.amazonaws.com/moonup/production/uploads/1663662426135-5f91b1208a61a359f44e1851.png)
45
 
46
  The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
47
 
@@ -49,13 +49,13 @@ RUDOLPH 2.7B is a Transformer-based decoder model with the following parameters:
49
 
50
  * num-layers (32) β€” Number of hidden layers in the Transformer decoder.
51
  * hidden-size (2560) β€” Dimensionality of the hidden layers.
52
- * num_attention_heads (32) β€” Number of attention heads for each attention layer.
53
 
54
  ### Sparse Attention Mask
55
 
56
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
57
 
58
- <img src=https://github.com/lizagonch/ru-dolph-fbc2/blob/develop_v1/pics/scheme-rudolph_27B.jpg>
59
 
60
  # Authors
61
 
 
34
  * Math QA – [DeepMind Mathematics Dataset](https://github.com/deepmind/mathematics_dataset).
35
  * Image Captioning – [COCO dataset](https://cocodataset.org/#home) (with automated translation).
36
  * Image Generation – [COCO dataset](https://cocodataset.org/#home) (with automated translation).
37
+ * VQA – [COCO dataset](https://cocodataset.org/#home) with prepared question set.
38
+ * [Text Recognition in the Wild](https://n-ws-f21jf.s3pd02.sbercloud.ru/b-ws-f21jf-ny6/FBC2/titw_dataset.zip) – the dataset consisting of synthetic and real-world human-annotated data for text recognition task.
39
 
40
  # Details of architecture
41
 
42
  ### Parameters
43
 
44
+ <img src=https://github.com/ai-forever/ru-dolph/blob/master/pics/scheme-rudolph_27B.jpg>
45
 
46
  The maximum sequence length that this model may be used with depends on the modality and stands for 384 - 576 - 128 for the left text tokens, image tokens, and right text tokens, respectively.
47
 
 
49
 
50
  * num-layers (32) β€” Number of hidden layers in the Transformer decoder.
51
  * hidden-size (2560) β€” Dimensionality of the hidden layers.
52
+ * num\_attention\_heads (32) β€” Number of attention heads for each attention layer.
53
 
54
  ### Sparse Attention Mask
55
 
56
  The primary proposed method is to modify the sparse transformer's attention mask to better control modalities. It allows us to calculate the transitions of modalities in both directions, unlike another similar work DALL-E Transformer, which used only one direction, "text to image". The proposed "image to right text" direction is achieved by extension sparse attention mask to the right for auto-repressively text generation with both image and left text condition.
57
 
58
+ ![rudolph27b_masks.png](https://s3.amazonaws.com/moonup/production/uploads/1663662426135-5f91b1208a61a359f44e1851.png)
59
 
60
  # Authors
61