FLODA: FLorence-2 Optimized for Deepfake Assessment

Model Description

FLODA (FLorence-2 Optimized for Deepfake Assessment) is an advanced deepfake detection model that leverages the power of Vision-Language Models (VLMs). It's designed to surpass existing deepfake detection models by integrating image captioning and authenticity assessment into a single end-to-end architecture.

Key Features

  • Utilizes Florence-2 as the base VLM for both caption generation and deepfake detection
  • Reframes deepfake detection as a Visual Question Answering (VQA) task
  • Incorporates image caption information for enhanced contextual understanding
  • Employs rsLoRA (rank-stabilized Low-Rank Adaptation) for efficient fine-tuning
  • Demonstrates strong generalization across diverse scenarios
  • Shows robustness against adversarial attacks

Model Architecture

FLODA is based on the Florence-2 model and consists of two main components:

  1. Vision Encoder: Uses DaViT (Dual Attention Vision Transformer)
  2. Multi-modality Encoder-Decoder: Based on a standard transformer architecture

The model is fine-tuned using rsLoRA, with the following configuration:

  • Rank (r): 8
  • Alpha (α): 8
  • Dropout: 0.05
  • Target Modules: q_proj, k_proj, v_proj, out_proj, lm_head

Performance

FLODA achieves state-of-the-art performance in deepfake detection:

  • Average accuracy across all datasets: 97.14%
  • Strong performance on both real and fake image datasets
  • 100% accuracy on several fake datasets and all attacked datasets

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

# Load the model and processor
model_path = "path/to/floda/model"
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("cuda").eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

def detect_deepfake(image_path):
    image = Image.open(image_path).convert("RGB")
    task_prompt = "<DEEPFAKE_DETECTION>"
    text_input = "Is this photo real?"
    
    inputs = processor(text=task_prompt + text_input, images=image, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids=inputs["input_ids"],
            pixel_values=inputs["pixel_values"],
            max_new_tokens=1024,
            num_beams=3
        )
    
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    result = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))[task_prompt]
    
    return "Real" if result.lower() == "yes" else "Fake"

# Example usage
result = detect_deepfake("path/to/image.jpg")
print(f"The image is: {result}")

Training Data

FLODA was trained on a dataset including:

  • Real images: MS COCO
  • Fake images: Generated by SD2 and LaMa

Evaluation Data

The model was evaluated on 16 datasets:

  • 2 real image datasets: MS COCO, Flickr30k
  • 14 fake image datasets generated by various models (e.g., SD2, SDXL, DeepFloyd IF, DALLE-2, SGXL)
  • Includes datasets with stylized images, inpainting, resolution changes, and face-swapping
  • Adversarial, backdoor, and data poisoning attack datasets

Limitations

  • Performance on the ControlNet dataset (77.07% accuracy) is lower compared to some competing models
  • The model's effectiveness on very recent or future AI-generated image techniques not included in the training or evaluation datasets is uncertain

Ethical Considerations

While FLODA shows promising results in deepfake detection, it's important to consider:

  • The potential for false positives or negatives, which could have significant implications depending on the use case
  • The need for continuous updating as new image generation techniques emerge
  • Privacy considerations when processing user-submitted images

Model Card Authors [optional]

  • Youngho Bae (Hanyang University)
  • Gunhui Han (Yonsei University)
  • Seunghyeon Park (Yonsei University)

Model Card Contact

For inquiries about this model card or the FLODA model, please contact:

Youngho Bae Email: byh711@gmail.com

Framework versions

  • PEFT 0.12.0
Downloads last month
70
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support visual-question-answering models for peft library.

Model tree for byh711/FLODA-deepfake

Adapter
(9)
this model

Space using byh711/FLODA-deepfake 1