Update README.md
Browse files
README.md
CHANGED
@@ -78,7 +78,7 @@ Use the code below to get started with the model.
|
|
78 |
|
79 |
```python
|
80 |
# Instantiate the model
|
81 |
-
model =
|
82 |
|
83 |
# Move model to GPU if available
|
84 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
@@ -168,6 +168,21 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
|
|
168 |
- Swin Shift Size: 2
|
169 |
- Style Transfer Module: Style Adaptive Layer Normalization (SALN)
|
170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
171 |
#### Speeds, Sizes, Times
|
172 |
|
173 |
**Model size:** There are currently five versions of the model:
|
@@ -176,6 +191,7 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
|
|
176 |
- v1_3: 93M params
|
177 |
- v2_1: 2.9M params
|
178 |
- v3: 252.6M params
|
|
|
179 |
|
180 |
**Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
|
181 |
|
@@ -196,6 +212,10 @@ Images and their corresponding style semantic maps were resized to **512 x 512**
|
|
196 |
- v3_fp16: 505M
|
197 |
- v3_bf16: 505M
|
198 |
- v3_int8: 344M
|
|
|
|
|
|
|
|
|
199 |
|
200 |
## Evaluation Data, Metrics & Results
|
201 |
|
@@ -203,7 +223,7 @@ This section covers information on how the model was evaluated at each stage.
|
|
203 |
|
204 |
### Evaluation Data
|
205 |
|
206 |
-
Evaluation was performed on real-time footage captured from Grand Theft Auto V, Cyberpunk 2077
|
207 |
|
208 |
### Metrics
|
209 |
|
|
|
78 |
|
79 |
```python
|
80 |
# Instantiate the model
|
81 |
+
model = RealFormerAGA(img_size=256, patch_size=8, emb_dim=768, num_heads=32, num_layers=16, hidden_dim=3072)
|
82 |
|
83 |
# Move model to GPU if available
|
84 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
|
168 |
- Swin Shift Size: 2
|
169 |
- Style Transfer Module: Style Adaptive Layer Normalization (SALN)
|
170 |
|
171 |
+
**v4**
|
172 |
+
- Precision: FP32, FP16, BF16, INT8
|
173 |
+
- Embedding Dimensions: 768
|
174 |
+
- Hidden Dimensions: 3072
|
175 |
+
- Attention Type: Location-Based Multi-Head Attention (Linear Attention) and Cross-Attention (pretrained attention-conditioned)
|
176 |
+
- Number of Attention Heads: 32
|
177 |
+
- Number of Attention Layers: 16
|
178 |
+
- Number of Transformer Encoder Layers (Feed-Forward): 16
|
179 |
+
- Number of Transformer Decoder Layers (Feed-Forward): 16
|
180 |
+
- Activation Functions: ReLU, GeLU
|
181 |
+
- Patch Size: 8
|
182 |
+
- Swin Window Size: 7
|
183 |
+
- Swin Shift Size: 2
|
184 |
+
- Style Transfer Module: Style Adaptive Layer Normalization (SALN)
|
185 |
+
|
186 |
#### Speeds, Sizes, Times
|
187 |
|
188 |
**Model size:** There are currently five versions of the model:
|
|
|
191 |
- v1_3: 93M params
|
192 |
- v2_1: 2.9M params
|
193 |
- v3: 252.6M params
|
194 |
+
- v4: 454.2M params
|
195 |
|
196 |
**Training hardware:** Each of the models were trained on 2 x T4 GPUs (multi-GPU training). For this reason, linear attention modules were implemented as ring (distributed) attention during training.
|
197 |
|
|
|
212 |
- v3_fp16: 505M
|
213 |
- v3_bf16: 505M
|
214 |
- v3_int8: 344M
|
215 |
+
- v4: 1.69 GB
|
216 |
+
- v4_fp16: 866M
|
217 |
+
- v4_bf16: 866M
|
218 |
+
- v4_int8: 578M
|
219 |
|
220 |
## Evaluation Data, Metrics & Results
|
221 |
|
|
|
223 |
|
224 |
### Evaluation Data
|
225 |
|
226 |
+
Evaluation was performed on real-time footage captured from Grand Theft Auto IV, Grand Theft Auto V, Cyberpunk 2077, WatchDogs, Marvel's Spiderman, Far Cry 6, Red Read Redemption 2 and Control.
|
227 |
|
228 |
### Metrics
|
229 |
|