doc: add experiments and badget
Browse files
README.md
CHANGED
@@ -16,6 +16,7 @@ app_port: 7860
|
|
16 |
<img alt="License" src="https://img.shields.io/github/license/JeffersonQin/YuzuMarker.FontDetection"/>
|
17 |
<img alt="Contributors" src="https://img.shields.io/github/contributors/JeffersonQin/YuzuMarker.FontDetection"/>
|
18 |
</p>
|
|
|
19 |
</div>
|
20 |
|
21 |
## Scene Text Font Dataset Generation
|
@@ -204,7 +205,7 @@ On our synthesized dataset,
|
|
204 |
| ResNet-18 | β | β | β | β | Sigmoid | 512x512 | I | 18.58% | 5c43f60 | I | float32 |
|
205 |
| ResNet-18 | β | β | β | β | Sigmoid | 512x512 | II<sup>2</sup> | 14.39% | 5a85fd3 | I | bfloat16_3x |
|
206 |
| ResNet-18 | β | β | β | β | Tanh | 512x512 | II | 16.24% | ff82fe6 | I | bfloat16_3x |
|
207 |
-
| ResNet-18 | β
*<sup>
|
208 |
| ResNet-18 | β
* | β | β | β | Tanh | 512x512 | I | 29.95% | 8364103 | I | bfloat16_3x |
|
209 |
| ResNet-18 | β
* | β | β | β | Sigmoid | 512x512 | I | 29.37% [Early stop] | 8d2e833 | I | bfloat16_3x |
|
210 |
| ResNet-18 | β
* | β | β | β | Sigmoid | 416x416 | I | [Lower Trend] | d5a3215 | I | bfloat16_3x |
|
@@ -214,11 +215,12 @@ On our synthesized dataset,
|
|
214 |
| ResNet-50 | β
* | β | β | β | Sigmoid | 512x512 | I | 34.21% | e980b66 | I | bfloat16_3x |
|
215 |
| ResNet-18 | β
* | β
| β | β | Sigmoid | 512x512 | I | 31.24% | 416c7bb | I | bfloat16_3x |
|
216 |
| ResNet-18 | β
* | β
| β
| β | Sigmoid | 512x512 | I | 34.69% | 855e240 | I | bfloat16_3x |
|
217 |
-
| ResNet-18 | βοΈ*<sup>
|
218 |
| ResNet-18 | βοΈ* | β
| β
| β | Sigmoid | 512x512 | III<sup>3</sup> | 38.87% | 0693434 | I | bfloat16_3x |
|
219 |
| ResNet-50 | βοΈ* | β
| β
| β | Sigmoid | 512x512 | III | 48.99% | bc0f7fc | II<sup>6</sup> | bfloat16_3x |
|
220 |
-
| ResNet-50 | βοΈ | β
| β
|
|
221 |
-
| ResNet-50 |
|
|
|
222 |
| ResNet-50 | β | β
| β
| β
| Sigmoid | 512x512 | III | 41.35% | 0f071a5 | II | bfloat16 |
|
223 |
|
224 |
* \* Bug in implementation
|
@@ -227,11 +229,12 @@ On our synthesized dataset,
|
|
227 |
* <sup>3</sup> `learning rate = 0.001, lambda = (2, 0.5, 1)`
|
228 |
* <sup>4</sup> `learning rate = 0.01, lambda = (2, 0.5, 1)`
|
229 |
* <sup>5</sup> Initial version of synthesized dataset
|
230 |
-
* <sup>6</sup> Doubled synthesized dataset
|
231 |
-
* <sup>7</sup>
|
232 |
-
* <sup>8</sup> Data Augmentation
|
233 |
-
* <sup>9</sup> Data Augmentation
|
234 |
-
* <sup>10</sup>
|
|
|
235 |
|
236 |
## Pretrained Models
|
237 |
|
|
|
16 |
<img alt="License" src="https://img.shields.io/github/license/JeffersonQin/YuzuMarker.FontDetection"/>
|
17 |
<img alt="Contributors" src="https://img.shields.io/github/contributors/JeffersonQin/YuzuMarker.FontDetection"/>
|
18 |
</p>
|
19 |
+
<a href="https://www.buymeacoffee.com/gyrojeff" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
|
20 |
</div>
|
21 |
|
22 |
## Scene Text Font Dataset Generation
|
|
|
205 |
| ResNet-18 | β | β | β | β | Sigmoid | 512x512 | I | 18.58% | 5c43f60 | I | float32 |
|
206 |
| ResNet-18 | β | β | β | β | Sigmoid | 512x512 | II<sup>2</sup> | 14.39% | 5a85fd3 | I | bfloat16_3x |
|
207 |
| ResNet-18 | β | β | β | β | Tanh | 512x512 | II | 16.24% | ff82fe6 | I | bfloat16_3x |
|
208 |
+
| ResNet-18 | β
*<sup>8</sup> | β | β | β | Tanh | 512x512 | II | 27.71% | a976004 | I | bfloat16_3x |
|
209 |
| ResNet-18 | β
* | β | β | β | Tanh | 512x512 | I | 29.95% | 8364103 | I | bfloat16_3x |
|
210 |
| ResNet-18 | β
* | β | β | β | Sigmoid | 512x512 | I | 29.37% [Early stop] | 8d2e833 | I | bfloat16_3x |
|
211 |
| ResNet-18 | β
* | β | β | β | Sigmoid | 416x416 | I | [Lower Trend] | d5a3215 | I | bfloat16_3x |
|
|
|
215 |
| ResNet-50 | β
* | β | β | β | Sigmoid | 512x512 | I | 34.21% | e980b66 | I | bfloat16_3x |
|
216 |
| ResNet-18 | β
* | β
| β | β | Sigmoid | 512x512 | I | 31.24% | 416c7bb | I | bfloat16_3x |
|
217 |
| ResNet-18 | β
* | β
| β
| β | Sigmoid | 512x512 | I | 34.69% | 855e240 | I | bfloat16_3x |
|
218 |
+
| ResNet-18 | βοΈ*<sup>9</sup> | β
| β
| β | Sigmoid | 512x512 | I | 38.32% | 1750035 | I | bfloat16_3x |
|
219 |
| ResNet-18 | βοΈ* | β
| β
| β | Sigmoid | 512x512 | III<sup>3</sup> | 38.87% | 0693434 | I | bfloat16_3x |
|
220 |
| ResNet-50 | βοΈ* | β
| β
| β | Sigmoid | 512x512 | III | 48.99% | bc0f7fc | II<sup>6</sup> | bfloat16_3x |
|
221 |
+
| ResNet-50 | βοΈ | β
| β
| β | Sigmoid | 512x512 | III | 48.45% | 0f071a5 | II | bfloat16_3x |
|
222 |
+
| ResNet-50 | βοΈ | β
| β
| β
<sup>11</sup> | Sigmoid | 512x512 | III | 46.12% | 0f071a5 | II | bfloat16 |
|
223 |
+
| ResNet-50 | β<sup>10</sup> | β
| β
| β | Sigmoid | 512x512 | III | 43.86% | 0f071a5 | II | bfloat16 |
|
224 |
| ResNet-50 | β | β
| β
| β
| Sigmoid | 512x512 | III | 41.35% | 0f071a5 | II | bfloat16 |
|
225 |
|
226 |
* \* Bug in implementation
|
|
|
229 |
* <sup>3</sup> `learning rate = 0.001, lambda = (2, 0.5, 1)`
|
230 |
* <sup>4</sup> `learning rate = 0.01, lambda = (2, 0.5, 1)`
|
231 |
* <sup>5</sup> Initial version of synthesized dataset
|
232 |
+
* <sup>6</sup> Doubled synthesized dataset (2x)
|
233 |
+
* <sup>7</sup> Quadruple synthesized dataset (4x)
|
234 |
+
* <sup>8</sup> Data Augmentation v1: Color Jitter + Random Crop [81%-100%]
|
235 |
+
* <sup>9</sup> Data Augmentation v2: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15Β°, 15Β°]
|
236 |
+
* <sup>10</sup> Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15Β°, 15Β°] + Random Horizontal Flip + Random Downsample [1, 2]
|
237 |
+
* <sup>11</sup> Preserve Aspect Ratio by Random Cropping
|
238 |
|
239 |
## Pretrained Models
|
240 |
|