FLODA-deepfake / README.md
byh711's picture
Update README.md
f8410d9 verified
metadata
base_model: microsoft/Florence-2-base-ft
library_name: peft
license: apache-2.0
language:
  - en
pipeline_tag: visual-question-answering
metrics:
  - accuracy
tags:
  - deepfake detection

FLODA: FLorence-2 Optimized for Deepfake Assessment

Model Description

FLODA (FLorence-2 Optimized for Deepfake Assessment) is an advanced deepfake detection model that leverages the power of Vision-Language Models (VLMs). It's designed to surpass existing deepfake detection models by integrating image captioning and authenticity assessment into a single end-to-end architecture.

Key Features

  • Utilizes Florence-2 as the base VLM for both caption generation and deepfake detection
  • Reframes deepfake detection as a Visual Question Answering (VQA) task
  • Incorporates image caption information for enhanced contextual understanding
  • Employs rsLoRA (rank-stabilized Low-Rank Adaptation) for efficient fine-tuning
  • Demonstrates strong generalization across diverse scenarios
  • Shows robustness against adversarial attacks

Model Architecture

FLODA is based on the Florence-2 model and consists of two main components:

  1. Vision Encoder: Uses DaViT (Dual Attention Vision Transformer)
  2. Multi-modality Encoder-Decoder: Based on a standard transformer architecture

The model is fine-tuned using rsLoRA, with the following configuration:

  • Rank (r): 8
  • Alpha (α): 8
  • Dropout: 0.05
  • Target Modules: q_proj, k_proj, v_proj, out_proj, lm_head

Performance

FLODA achieves state-of-the-art performance in deepfake detection:

  • Average accuracy across all datasets: 97.14%
  • Strong performance on both real and fake image datasets
  • 100% accuracy on several fake datasets and all attacked datasets

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

# Load the model and processor
model_path = "path/to/floda/model"
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("cuda").eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

def detect_deepfake(image_path):
    image = Image.open(image_path).convert("RGB")
    task_prompt = "<DEEPFAKE_DETECTION>"
    text_input = "Is this photo real?"
    
    inputs = processor(text=task_prompt + text_input, images=image, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids=inputs["input_ids"],
            pixel_values=inputs["pixel_values"],
            max_new_tokens=1024,
            num_beams=3
        )
    
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    result = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))[task_prompt]
    
    return "Real" if result.lower() == "yes" else "Fake"

# Example usage
result = detect_deepfake("path/to/image.jpg")
print(f"The image is: {result}")

Training Data

FLODA was trained on a dataset including:

  • Real images: MS COCO
  • Fake images: Generated by SD2 and LaMa

Evaluation Data

The model was evaluated on 16 datasets:

  • 2 real image datasets: MS COCO, Flickr30k
  • 14 fake image datasets generated by various models (e.g., SD2, SDXL, DeepFloyd IF, DALLE-2, SGXL)
  • Includes datasets with stylized images, inpainting, resolution changes, and face-swapping
  • Adversarial, backdoor, and data poisoning attack datasets

Limitations

  • Performance on the ControlNet dataset (77.07% accuracy) is lower compared to some competing models
  • The model's effectiveness on very recent or future AI-generated image techniques not included in the training or evaluation datasets is uncertain

Ethical Considerations

While FLODA shows promising results in deepfake detection, it's important to consider:

  • The potential for false positives or negatives, which could have significant implications depending on the use case
  • The need for continuous updating as new image generation techniques emerge
  • Privacy considerations when processing user-submitted images

Model Card Authors [optional]

  • Youngho Bae (Hanyang University)
  • Gunhui Han (Yonsei University)
  • Seunghyeon Park (Yonsei University)

Model Card Contact

For inquiries about this model card or the FLODA model, please contact:

Youngho Bae Email: byh711@gmail.com

Framework versions

  • PEFT 0.12.0