FLODA-deepfake / README.md

Update README.md

f8410d9 verified 2 months ago

4.52 kB

	---
	base_model: microsoft/Florence-2-base-ft
	library_name: peft
	license: apache-2.0
	language:
	- en
	pipeline_tag: visual-question-answering
	metrics:
	- accuracy
	tags:
	- deepfake detection
	---

	# FLODA: FLorence-2 Optimized for Deepfake Assessment

	## Model Description

	FLODA (FLorence-2 Optimized for Deepfake Assessment) is an advanced deepfake detection model that leverages the power of Vision-Language Models (VLMs). It's designed to surpass existing deepfake detection models by integrating image captioning and authenticity assessment into a single end-to-end architecture.

	## Key Features

	- Utilizes Florence-2 as the base VLM for both caption generation and deepfake detection
	- Reframes deepfake detection as a Visual Question Answering (VQA) task
	- Incorporates image caption information for enhanced contextual understanding
	- Employs rsLoRA (rank-stabilized Low-Rank Adaptation) for efficient fine-tuning
	- Demonstrates strong generalization across diverse scenarios
	- Shows robustness against adversarial attacks

	## Model Architecture

	FLODA is based on the Florence-2 model and consists of two main components:

	1. Vision Encoder: Uses DaViT (Dual Attention Vision Transformer)
	2. Multi-modality Encoder-Decoder: Based on a standard transformer architecture

	The model is fine-tuned using rsLoRA, with the following configuration:

	- Rank (r): 8
	- Alpha (α): 8
	- Dropout: 0.05
	- Target Modules: q_proj, k_proj, v_proj, out_proj, lm_head

	## Performance

	FLODA achieves state-of-the-art performance in deepfake detection:

	- Average accuracy across all datasets: 97.14%
	- Strong performance on both real and fake image datasets
	- 100% accuracy on several fake datasets and all attacked datasets

	## Usage

	```python
	from transformers import AutoProcessor, AutoModelForCausalLM
	from PIL import Image
	import torch

	# Load the model and processor
	model_path = "path/to/floda/model"
	model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("cuda").eval()
	processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

	def detect_deepfake(image_path):
	image = Image.open(image_path).convert("RGB")
	task_prompt = "<DEEPFAKE_DETECTION>"
	text_input = "Is this photo real?"

	inputs = processor(text=task_prompt + text_input, images=image, return_tensors="pt").to("cuda")

	with torch.no_grad():
	generated_ids = model.generate(
	input_ids=inputs["input_ids"],
	pixel_values=inputs["pixel_values"],
	max_new_tokens=1024,
	num_beams=3
	)

	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
	result = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))[task_prompt]

	return "Real" if result.lower() == "yes" else "Fake"

	# Example usage
	result = detect_deepfake("path/to/image.jpg")
	print(f"The image is: {result}")
	```

	## Training Data

	FLODA was trained on a dataset including:
	- Real images: MS COCO
	- Fake images: Generated by SD2 and LaMa

	## Evaluation Data

	The model was evaluated on 16 datasets:
	- 2 real image datasets: MS COCO, Flickr30k
	- 14 fake image datasets generated by various models (e.g., SD2, SDXL, DeepFloyd IF, DALLE-2, SGXL)
	- Includes datasets with stylized images, inpainting, resolution changes, and face-swapping
	- Adversarial, backdoor, and data poisoning attack datasets

	## Limitations

	- Performance on the ControlNet dataset (77.07% accuracy) is lower compared to some competing models
	- The model's effectiveness on very recent or future AI-generated image techniques not included in the training or evaluation datasets is uncertain

	## Ethical Considerations

	While FLODA shows promising results in deepfake detection, it's important to consider:
	- The potential for false positives or negatives, which could have significant implications depending on the use case
	- The need for continuous updating as new image generation techniques emerge
	- Privacy considerations when processing user-submitted images

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	## Model Card Authors [optional]

	- Youngho Bae (Hanyang University)
	- Gunhui Han (Yonsei University)
	- Seunghyeon Park (Yonsei University)

	## Model Card Contact

	For inquiries about this model card or the FLODA model, please contact:

	Youngho Bae
	Email: byh711@gmail.com

	### Framework versions

	- PEFT 0.12.0