aoxo
/

Image-to-Image
English
art
aoxo commited on
Commit
720961d
·
verified ·
1 Parent(s): 9549f55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -2
README.md CHANGED
@@ -78,7 +78,7 @@ Use the code below to get started with the model.
78
 
79
  ```python
80
  # Instantiate the model
81
- model = RealFormerv3(img_size=256, patch_size=8, emb_dim=768, num_heads=42, num_layers=16, hidden_dim=3072)
82
 
83
  # Move model to GPU if available
84
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
@@ -168,6 +168,21 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
168
  - Swin Shift Size: 2
169
  - Style Transfer Module: Style Adaptive Layer Normalization (SALN)
170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  #### Speeds, Sizes, Times
172
 
173
  **Model size:** There are currently five versions of the model:
@@ -176,6 +191,7 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
176
  - v1_3: 93M params
177
  - v2_1: 2.9M params
178
  - v3: 252.6M params
 
179
 
180
  **Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
181
 
@@ -196,6 +212,10 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
196
  - v3_fp16: 505M
197
  - v3_bf16: 505M
198
  - v3_int8: 344M
 
 
 
 
199
 
200
  ## Evaluation Data, Metrics & Results
201
 
@@ -203,7 +223,7 @@ This section covers information on how the model was evaluated at each stage.
203
 
204
  ### Evaluation Data
205
 
206
- Evaluation was performed on real-time footage captured from Grand Theft Auto V, Cyberpunk 2077 and WatchDogs 2.
207
 
208
  ### Metrics
209
 
 
78
 
79
  ```python
80
  # Instantiate the model
81
+ model = RealFormerAGA(img_size=256, patch_size=8, emb_dim=768, num_heads=32, num_layers=16, hidden_dim=3072)
82
 
83
  # Move model to GPU if available
84
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
168
  - Swin Shift Size: 2
169
  - Style Transfer Module: Style Adaptive Layer Normalization (SALN)
170
 
171
+ **v4**
172
+ - Precision: FP32, FP16, BF16, INT8
173
+ - Embedding Dimensions: 768
174
+ - Hidden Dimensions: 3072
175
+ - Attention Type: Location-Based Multi-Head Attention (Linear Attention) and Cross-Attention (pretrained attention-conditioned)
176
+ - Number of Attention Heads: 32
177
+ - Number of Attention Layers: 16
178
+ - Number of Transformer Encoder Layers (Feed-Forward): 16
179
+ - Number of Transformer Decoder Layers (Feed-Forward): 16
180
+ - Activation Functions: ReLU, GeLU
181
+ - Patch Size: 8
182
+ - Swin Window Size: 7
183
+ - Swin Shift Size: 2
184
+ - Style Transfer Module: Style Adaptive Layer Normalization (SALN)
185
+
186
  #### Speeds, Sizes, Times
187
 
188
  **Model size:** There are currently five versions of the model:
 
191
  - v1_3: 93M params
192
  - v2_1: 2.9M params
193
  - v3: 252.6M params
194
+ - v4: 454.2M params
195
 
196
  **Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
197
 
 
212
  - v3_fp16: 505M
213
  - v3_bf16: 505M
214
  - v3_int8: 344M
215
+ - v4: 1.69 GB
216
+ - v4_fp16: 866M
217
+ - v4_bf16: 866M
218
+ - v4_int8: 578M
219
 
220
  ## Evaluation Data, Metrics & Results
221
 
 
223
 
224
  ### Evaluation Data
225
 
226
+ Evaluation was performed on real-time footage captured from Grand Theft Auto IV, Grand Theft Auto V, Cyberpunk 2077, WatchDogs, Marvel's Spiderman, Far Cry 6, Red Read Redemption 2 and Control.
227
 
228
  ### Metrics
229