DAViD β€” Checkpoints (ViFi-CLIP encoder + classification head)

Model weights for DAViD, a deepfake & AI-generated video/image detector.

This repo hosts two checkpoints:

File Size Description
k400_clip_complete_finetuned_30_epochs.pth ~1.6 GB ViFi-CLIP (ViT-B/16) image encoder, fine-tuned on Kinetics-400 for 30 epochs
best_detector_model.pt ~3 MB MLP classification head (dense β†’ dense1 β†’ dense2), trained on the DAViD dataset + CDDB

How they fit together

  1. Encoder β€” a ViFi-CLIP (ViT-B/16) visual backbone fine-tuned on Kinetics-400. Each frame (or image) is encoded into a 512-dim embedding.
  2. Classification head β€” a lightweight MLP that maps the (averaged) 512-dim embedding to 3 classes: real, deepfake, ai_gen. It was trained on a mix of the DAViD video dataset and CDDB (an image-based deepfake benchmark), so it supports both video and single-image input.

Why these live on the Hub

The DAViD Space downloads these at Docker build time. They were previously on Google Drive, but Drive throttles datacenter IPs and broke the Space build. Serving them from the HF Hub is reliable from HF's build infrastructure.

Usage

1. Get the model code from GitHub

The model definitions (model.py, encoder.py, and the clip/ package) are not in this weights repo β€” they live in the training repo aitf-its-tim3-dfk/david (branch feat-cddb). Clone it first and run from inside it:

git clone -b feat-cddb https://github.com/aitf-its-tim3-dfk/david
cd david
pip install -r requirements.txt

This is what makes from model import ... and from encoder import ... below work.

2. Download the checkpoints (no auth needed β€” public repo)

from huggingface_hub import hf_hub_download

REPO = "aitf-its-tim3-dfk/david-encoder"
encoder_ckpt    = hf_hub_download(REPO, "k400_clip_complete_finetuned_30_epochs.pth")
classifier_ckpt = hf_hub_download(REPO, "best_detector_model.pt")

3. Load and run

import torch
from encoder import load_feature_extractor   # from the cloned GitHub repo
from model import ClassificationHead          # from the cloned GitHub repo

feature_extractor = load_feature_extractor(
    arch="ViT-B/16",
    class_names=("real", "deepfake", "ai_gen"),
    checkpoint_path=encoder_ckpt,
).eval()

classifier = ClassificationHead(input_dim=512, num_classes=3)
classifier.load_state_dict(torch.load(classifier_ckpt, map_location="cpu", weights_only=False))
classifier.eval()

# feats = feature_extractor.image_encoder(frames)   # (N, 512)
# logits = classifier(feats.mean(dim=0, keepdim=True))  # (1, 3)

Training

  • Encoder: CLIP ViT-B/16 (ViFi-CLIP), fine-tuned on Kinetics-400, 30 epochs, output dim 512.
  • Classification head: MLP trained on DAViD video dataset + CDDB images (branch feat-cddb).

Related

License

Set the appropriate license for these weights (currently other). The CLIP backbone, Kinetics-400, and CDDB carry their own upstream licenses/terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support