LLaVA-Deepfake Model

Overview

The LLaVA-Deepfake model is a fine-tuned version of LLaVA-v1.5-13B, specifically designed for detecting and analyzing deepfake images. This multimodal large language model (MLLM) not only identifies whether an image is a deepfake but also provides detailed explanations of manipulated areas, highlighting specific features such as irregularities in the eyes, mouth, or overall facial texture. The model leverages advanced vision and language processing capabilities, making it a powerful tool for forensic deepfake detection.

Installation

Follow these steps to set up and run the LLaVA-Deepfake model:

Step 1: Clone the Repository

Start by cloning the model repository:

git clone https://huggingface.co/pou876/llava-deepfake-model
cd llava-deepfake-model

Step 2: Create a Python Environment

Set up a dedicated Python environment for running the model:

conda create -n llava_deepfake python=3.10 -y
conda activate llava_deepfake
pip install --upgrade pip
pip install -r requirements.txt

Running the Model

Step 1: Start the Controller

The controller manages the communication between components:

python -m llava.serve.controller --host 0.0.0.0 --port 10000

Step 2: Start the Model Worker

The worker loads the deepfake detection model and processes inference requests:

python -m llava.serve.model_worker --host 0.0.0.0 \
    --controller http://localhost:10000 --port 40000 \
    --worker http://localhost:40000 \
    --model-path ./llava-deepfake-model --load-4bit

Step 3: Start the Gradio Web Server

The Gradio web server provides a user-friendly interface for interacting with the model:

python -m llava.serve.gradio_web_server \
    --controller http://localhost:10000 --model-list-mode reload --share

Once the web server is running, a URL (e.g., http://127.0.0.1:7860) will be generated. Open this link in your browser to start using the interface.