---
license: mit
datasets:
- deepghs/monochrome_danbooru
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- art
---

The models used for determining whether an anime image is monochrome have a training size of 384.

|               Model              |  FLOPs | Accuracy |                                                          Confusion Matrix                                                          | Description                                                                                                                                            |
|:--------------------------------:|:------:|:--------:|:----------------------------------------------------------------------------------------------------------------------------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------|
|           caformer_s36           | 22.10G |  95.63%  |           [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36/plot_confusion.png)           | Model: caformer_s36 from timm                                                                                                                          |
|        caformer_s36_safe2        | 22.10G |  95.52%  |        [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_safe2/plot_confusion.png)        | Model: caformer_s36 from timm, which have better precision and lower recall than caformer_s36                                                          |
|         caformer_s36_plus        | 22.10G |  97.31%  |         [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_plus/plot_confusion.png)        | Model: caformer_s36.sail_in22k_ft_in1k_384 pratrained from timm                                                                                        |
|      caformer_s36_plus_safe2     | 22.10G |  97.09%  |      [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_plus_safe2/plot_confusion.png)     | Model: caformer_s36.sail_in22k_ft_in1k_384 pratrained from timm, which have better precision and lower recall than caformer_s36.sail_in22k_ft_in1k_384 |
|       mobilenetv3_large_100      |  0.63G |  95.40%  |       [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100/plot_confusion.png)      | Model: mobilenetv3_large_100 from timm                                                                                                                 |
|    mobilenetv3_large_100_dist    |  0.63G |  96.30%  |    [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_dist/plot_confusion.png)    | Distillated from caformer_s36_plus, using mobilenetv3_large_100                                                                                        |
|    mobilenetv3_large_100_safe2   |  0.63G |  94.62%  |    [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_safe2/plot_confusion.png)   | Model: mobilenetv3_large_100 from timm, which have better precision and lower recall than mobilenetv3_large_100                                        |
| mobilenetv3_large_100_dist_safe2 |  0.63G |  95.85%  | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_dist_safe2/plot_confusion.png) | Distillated from caformer_s36_plus_safe2, using mobilenetv3_large_100                                                                                  |