competitions/aiornot · 4th place solution

Mar 1, 2023

•

edited Mar 1, 2023

My solution was quite simple. Using EfficientNet B2 (from timm, pretrained on imagenet) both trained for 100 epochs, on full resolution images (no resizing), with only 90deg rotation and flips as augmentations (D4 augmentation)

Vanilla EFN B2: public = 0.03776871233721767 private = 0.043104877797785726
EFN B2 with stride in the conv stem set to 1: 0.020664673591721437 private = 0.02209126260415937
Average: public = 0.013067678778629624 private = 0.015145883351528111

rashmi

Mar 1, 2023

Thank you for sharing your approach. Can you share a snippet of code (or link) on how to set stride - "EFN B2 with stride in the conv stem set to 1"

lucasvw

Mar 1, 2023

Thanks for sharing your approach!

I also have a question on EFN B2 with stride in the conv stem set to 1, what's your intuition for doing this? What made you try this out, and do you have an idea why this gives such a boost over the Vanilla one?

Yassine

Mar 1, 2023

•

edited Mar 1, 2023

It makes the net focus more on fine-grained details such as textures and high-frequency signal. AI-generated images likely have a very different noise component than natural images, which is why this gives a boost in performance.
It's a common trick in image forensics using deep learning (first introduced in image steganalysis).
You can achieve it simply by doing this:

model = timm.create_model('efficientnet_b2', ...)
model.conv_stem.stride = (1,1)

Note: there might be better backbones than efficientnet (e.g. efficientnetV2, convnext, etc.) I didn't have time to try them out but I'm confident just swapping will improve performance.

Competition Feedback: the competition was quite short, I found that I only had time to make 4 submissions and couldn't squeeze in more experiments :/

abhishek

Competitions org Mar 1, 2023

Thank you for sharing 🤗