ONNX
imgutils-models / README.md
narugo's picture
Update README.md
4732389
|
raw
history blame
2.83 kB
metadata
license: mit
datasets:
  - deepghs/chafen_arknights
  - deepghs/monochrome_danbooru
metrics:
  - accuracy

imgutils-models

This repository includes all the models in deepghs/imgutils.

LPIPS

This model is used for clustering anime images (named 差分 in Chinese), based on richzhang/PerceptualSimilarity, trained with dataset deepghs/chafen_arknights(private).

When threshold is 0.45, the adjusted rand score can reach 0.995.

File lists:

  • lpips_diff.onnx, feature difference.
  • lpips_feature.onnx, feature extracting.

Monochrome

These model is used for monochrome image classification, based on CNNs and Transformers, trained with dataset deepghs/monochrome_danbooru(private).

The following are the checkpoints that have been formally put into use, all based on the Caformer architecture:

Checkpoint Algorithm Safe Level Accuracy False Negative False Positive
monochrome-caformer-40 caformer 0 96.41% 2.69% 0.89%
monochrome-caformer-110 caformer 0 96.97% 1.57% 1.46%
monochrome-caformer_safe2-80 caformer 2 94.84% 1.12% 4.03%
monochrome-caformer_safe4-70 caformer 4 94.28% 0.67% 5.04%

monochrome-caformer-110 has the best overall accuracy among them, but considering that this model is often used to screen out monochrome images and we want to screen out as many as possible without omission, we have also introduced weighted models (safe2 and safe4). Although their overall accuracy has been slightly reduced, the probability of False Negative (misidentifying a monochrome image as a colored one) is lower, making them more suitable for batch screening.

Deepdanbooru

deepdanbooru is a model used to tag anime images. Here, we provide a table for tag classification called deepdanbooru_tags.csv, as well as an ONNX model (from chinoll/deepdanbooru).

It's worth noting that due to the poor quality of the deepdanbooru model itself and the relatively old dataset, it is only for testing purposes and is not recommended to be used as the main classification model. We recommend using the wd14 model instead, see: