Deepfake Detection β Hybrid ViT + DCT (CIFAKE)
Trained checkpoints for a deepfake image detection project using a hybrid Vision Transformer + DCT frequency-domain architecture.
Models
| Checkpoint | Architecture | Test Acc | Test AUC-ROC |
|---|---|---|---|
baseline_resnet50_best.pt |
ResNet-50 (fine-tuned) | β | β |
baseline_efficientnet_b4_best.pt |
EfficientNet-B4 (fine-tuned) | β | β |
dct_only_main_best.pt |
DCT-only CNN | β | β |
hybrid_main_best.pt |
Hybrid ViT-B/16 + DCT | β | β |
Dataset
CIFAKE: Real and AI-Generated Synthetic Images β 60K real (CIFAR-10) + 60K Stable Diffusion generated images.
Usage
import torch
from models.baseline_cnn import ResNet50Classifier
model = ResNet50Classifier(pretrained=False)
model.load_state_dict(torch.load("baseline_resnet50_best.pt", map_location="cpu"))
model.eval()
Architecture
- ViT branch:
vit_base_patch16_224(timm), CLS token β 768-dim - DCT branch: Block-wise 2D DCT on 8Γ8 tiles β small CNN β 256-dim
- Fusion: concat(1024) β LayerNorm β Linear(512) β GELU β Dropout(0.3) β Linear(1)