Deepfake Detection — Hybrid ViT + DCT (CIFAKE)

Trained checkpoints for a deepfake image detection project using a hybrid Vision Transformer + DCT frequency-domain architecture.

Models

Checkpoint	Architecture	Test Acc	Test AUC-ROC
`baseline_resnet50_best.pt`	ResNet-50 (fine-tuned)	—	—
`baseline_efficientnet_b4_best.pt`	EfficientNet-B4 (fine-tuned)	—	—
`dct_only_main_best.pt`	DCT-only CNN	—	—
`hybrid_main_best.pt`	Hybrid ViT-B/16 + DCT	—	—

Dataset

CIFAKE: Real and AI-Generated Synthetic Images — 60K real (CIFAR-10) + 60K Stable Diffusion generated images.

Usage

import torch
from models.baseline_cnn import ResNet50Classifier

model = ResNet50Classifier(pretrained=False)
model.load_state_dict(torch.load("baseline_resnet50_best.pt", map_location="cpu"))
model.eval()

Architecture

ViT branch: vit_base_patch16_224 (timm), CLS token → 768-dim
DCT branch: Block-wise 2D DCT on 8×8 tiles → small CNN → 256-dim
Fusion: concat(1024) → LayerNorm → Linear(512) → GELU → Dropout(0.3) → Linear(1)

Downloads last month: -; Downloads are not tracked for this model. How to track