AnimeClassifiers / README.md
city96's picture
Update README.md
f7aa858
|
raw
history blame
4.18 kB
metadata
license: apache-2.0

Anime Classifiers

Training/inference code | Live Demo

These are models that predict whether a concept is present in an image. The performance on high resolution images isn't very good, especially when detecting subtle image effects such as noise. This is due to CLIP using a fairly low resolution (336x336/224x224).

To combat this, tiling is used at inference time. The input image is first downscaled to 1536 (shortest edge - See TF.functional.resize), then 5 separate 512x512 areas are selected (4 corners + center - See TF.functional.five_crop). This helps as the downscale factor isn't nearly as drastic as passing the entire image to CLIP. As a bonus, it also avoids the issues with odd aspect ratios requiring cropping or letterboxing to work.

Tiling

As for the training, it will be detailed in the sections below for the individual classifiers. At first, specialized models will be trained to a relatively high accuracy, building up a high quality but specific dataset in the process.

Then, these models will be used to split/sort each other's the datasets. The code will need to be updated to support one image being part of more than one class, but the final result should be a clean dataset where each target aspect acts as a "tag" rather than a class.

Architecture

The base model itself is fairly simple. It takes embeddings from a CLIP model (in this case, openai/clip-vit-large-patch14) and expands them to 1024 dimensions. From there, a single block with residuals is followed by a few linear layers which converge down to the final output.

For the classifier models, the final output goes through nn.Softmax.

Models

Future/planned

  • Unified (by joining the datasets of the other classifiers)
  • Compression (jpg/webp/gif/dithering/etc)
  • Noise

ChromaticAberration - Anime

Design goals

The goal was to detect chromatic aberration in images.

For some odd reason, this effect has become a popular post processing effect to apply to images and drawings. While attempting to train an ESRGAN model, I noticed an odd halo around images and quickly figured out that this effect was the cause. This classifier aims to work as a base filter to remove such images from the dataset.

Issues

  • Seems to get confused by excessive HSV noise
  • Triggers even if the effect is only applied to the background
  • Sometimes triggers on rough linework/sketches (i.e. multiple semi-transparent lines overlapping)
  • Low accuracy on 3D/2.5D with possible false positives.

Training

The training settings can be found in the config/CCAnime-ChromaticAberration-v1.yaml file (7e-6 LR, cosine scheduler, 100K steps).

loss

loss-eval

Final dataset score distribution for v1.16:

3215 images in dataset.
0_reg       -  395 ||||
0_reg_booru - 1805 ||||||||||||||||||||||
1_chroma    -  515 ||||||
1_synthetic -  500 ||||||

Class ratios:
00 - 2200 |||||||||||||||||||||||||||
01 - 1015 ||||||||||||

Version history:

  • v1.0 - Initial test model, dataset is fully synthetic (500 images). Effect added by shifting red/blue channel by a random amount using chaiNNer.
  • v1.1 - Added 300 images tagged "chromatic_aberration" from gelbooru. Added first 1000 images from danbooru2021 as reg images
  • v1.2 - Used the newly trained predictor to filter the existing datasets - found ~70 positives in the reg set and ~30 false positives in the target set.
  • v1.3-v1.16 - Repeatedly ran predictor against various datasets, adding false positives/negatives back into the dataset, sometimes running against the training set to filter out misclassified images as the predictor got better. Added/removed images were manually checked (My eyes hurt).