city96 commited on
Commit
f7aa858
1 Parent(s): d296b97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # Anime Classifiers
6
+
7
+ [Training/inference code](https://github.com/city96/CityClassifiers) | [Live Demo](https://huggingface.co/spaces/city96/AnimeClassifiers-demo)
8
+
9
+
10
+ These are models that predict whether a concept is present in an image. The performance on high resolution images isn't very good, especially when detecting subtle image effects such as noise. This is due to CLIP using a fairly low resolution (336x336/224x224).
11
+
12
+ To combat this, tiling is used at inference time. The input image is first downscaled to 1536 (shortest edge - See `TF.functional.resize`), then 5 separate 512x512 areas are selected (4 corners + center - See `TF.functional.five_crop`). This helps as the downscale factor isn't nearly as drastic as passing the entire image to CLIP. As a bonus, it also avoids the issues with odd aspect ratios requiring cropping or letterboxing to work.
13
+
14
+ ![Tiling](https://github.com/city96/CityClassifiers/assets/125218114/66a30048-93ce-4c00-befc-0d986c84ec9f)
15
+
16
+ As for the training, it will be detailed in the sections below for the individual classifiers. At first, specialized models will be trained to a relatively high accuracy, building up a high quality but specific dataset in the process.
17
+
18
+ Then, these models will be used to split/sort each other's the datasets. The code will need to be updated to support one image being part of more than one class, but the final result should be a clean dataset where each target aspect acts as a "tag" rather than a class.
19
+
20
+ ## Architecture
21
+
22
+ The base model itself is fairly simple. It takes embeddings from a CLIP model (in this case, `openai/clip-vit-large-patch14`) and expands them to 1024 dimensions. From there, a single block with residuals is followed by a few linear layers which converge down to the final output.
23
+
24
+ For the classifier models, the final output goes through `nn.Softmax`.
25
+
26
+ # Models
27
+
28
+ ## Future/planned
29
+
30
+ - Unified (by joining the datasets of the other classifiers)
31
+ - Compression (jpg/webp/gif/dithering/etc)
32
+ - Noise
33
+
34
+ ## ChromaticAberration - Anime
35
+
36
+ ### Design goals
37
+
38
+ The goal was to detect [chromatic aberration](https://en.wikipedia.org/wiki/Chromatic_aberration?useskin=vector) in images.
39
+
40
+ For some odd reason, this effect has become a popular post processing effect to apply to images and drawings. While attempting to train an ESRGAN model, I noticed an odd halo around images and quickly figured out that this effect was the cause. This classifier aims to work as a base filter to remove such images from the dataset.
41
+
42
+ ### Issues
43
+
44
+ - Seems to get confused by excessive HSV noise
45
+ - Triggers even if the effect is only applied to the background
46
+ - Sometimes triggers on rough linework/sketches (i.e. multiple semi-transparent lines overlapping)
47
+ - Low accuracy on 3D/2.5D with possible false positives.
48
+
49
+ ### Training
50
+
51
+ The training settings can be found in the `config/CCAnime-ChromaticAberration-v1.yaml` file (7e-6 LR, cosine scheduler, 100K steps).
52
+
53
+ ![loss](https://github.com/city96/CityClassifiers/assets/125218114/475f1241-2b4e-4fc9-bbcd-261b85b8b491)
54
+
55
+ ![loss-eval](https://github.com/city96/CityClassifiers/assets/125218114/88d6f090-aa6f-42ad-9fd0-8c5d267fce5e)
56
+
57
+
58
+ Final dataset score distribution for v1.16:
59
+ ```
60
+ 3215 images in dataset.
61
+ 0_reg - 395 ||||
62
+ 0_reg_booru - 1805 ||||||||||||||||||||||
63
+ 1_chroma - 515 ||||||
64
+ 1_synthetic - 500 ||||||
65
+
66
+ Class ratios:
67
+ 00 - 2200 |||||||||||||||||||||||||||
68
+ 01 - 1015 ||||||||||||
69
+ ```
70
+
71
+ Version history:
72
+
73
+ - v1.0 - Initial test model, dataset is fully synthetic (500 images). Effect added by shifting red/blue channel by a random amount using chaiNNer.
74
+ - v1.1 - Added 300 images tagged "chromatic_aberration" from gelbooru. Added first 1000 images from danbooru2021 as reg images
75
+ - v1.2 - Used the newly trained predictor to filter the existing datasets - found ~70 positives in the reg set and ~30 false positives in the target set.
76
+ - v1.3-v1.16 - Repeatedly ran predictor against various datasets, adding false positives/negatives back into the dataset, sometimes running against the training set to filter out misclassified images as the predictor got better. Added/removed images were manually checked (My eyes hurt).