4th place solution

#30
by Yassine - opened

My solution was quite simple. Using EfficientNet B2 (from timm, pretrained on imagenet) both trained for 100 epochs, on full resolution images (no resizing), with only 90deg rotation and flips as augmentations (D4 augmentation)

  • Vanilla EFN B2: public = 0.03776871233721767 private = 0.043104877797785726
  • EFN B2 with stride in the conv stem set to 1: 0.020664673591721437 private = 0.02209126260415937
  • Average: public = 0.013067678778629624 private = 0.015145883351528111

Thank you for sharing your approach. Can you share a snippet of code (or link) on how to set stride - "EFN B2 with stride in the conv stem set to 1"

Thanks for sharing your approach!

I also have a question on EFN B2 with stride in the conv stem set to 1, what's your intuition for doing this? What made you try this out, and do you have an idea why this gives such a boost over the Vanilla one?

It makes the net focus more on fine-grained details such as textures and high-frequency signal. AI-generated images likely have a very different noise component than natural images, which is why this gives a boost in performance.
It's a common trick in image forensics using deep learning (first introduced in image steganalysis).
You can achieve it simply by doing this:

model = timm.create_model('efficientnet_b2', ...)
model.conv_stem.stride = (1,1) 

Note: there might be better backbones than efficientnet (e.g. efficientnetV2, convnext, etc.) I didn't have time to try them out but I'm confident just swapping will improve performance.

Competition Feedback: the competition was quite short, I found that I only had time to make 4 submissions and couldn't squeeze in more experiments :/

Competitions org

Thank you for sharing ๐Ÿค—

Sign up or log in to comment