AI Detection Model
Model Architecture and Training
Three separate models were initially trained:
- Midjourney vs. Real Images
- Stable Diffusion vs. Real Images
- Stable Diffusion Fine-tunings vs. Real Images
Data preparation process:
- Used Google's Open Image Dataset for real images
- Described real images using BLIP (Bootstrapping Language-Image Pre-training)
- Generated Stable Diffusion images using BLIP descriptions
- Found similar Midjourney images based on BLIP descriptions
This approach ensured real and AI-generated images were as similar as possible, differing only in their origin.
The three models were then distilled into a small ViT model with 11.8 Million Parameters, combining their learned features for more efficient detection.
Data Sources
- Google's Open Image Dataset: link
- Ivan Sivkov's Midjourney Dataset: link
- TANREI(NAMA)'s Stable Diffusion Prompts Dataset: link
Performance
Validation Set: 74% accuracy
- Held out from training data to assess generalization
Custom Real-World Set: 72% accuracy
- Composed of self-captured images and online-sourced images
- Designed to be more representative of internet-based images
Comparative Analysis:
- Outperformed other popular AI detection models by 5 percentage points on both sets
- Other models achieved 89% and 79% on validation and real-world sets respectively
Key Insights
- Strong generalization on validation data (75% accuracy)
- Good adaptability to diverse, real-world images (72% accuracy)
- Consistent outperformance of other popular models
- 10-point accuracy drop from validation to real-world set indicates room for improvement
- Comprehensive training on multiple AI generation techniques contributes to model versatility
- Focus on subtle differences in image generation rather than content disparities
Future Directions
- Expand dataset with more diverse, real-world examples to bridge the performance gap
- Improve generalization to internet-sourced images
- Conduct error analysis on misclassified samples to identify patterns
- Integrate new AI image generation techniques as they emerge
- Consider fine-tuning for specific domains where detection accuracy is critical
- Downloads last month
- 296
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- accuracyself-reported0.740