dvf_trained_transferred_aegis

R(2+1)D + mixed-domain supervised-contrastive model for end-to-end AI-generated video detection. A DVF-trained R(2+1)D backbone is transferred to the AEGIS domain while retaining performance on DVF and GenVideo, then evaluated with a projection-space prototype protocol.

Backbone: torchvision r2plus1d_18, trained from scratch (not Kinetics-initialised).
Projection head: Linear(512 → 512) → ReLU → Linear(512 → 128), L2-normalised output.
Classifier head: Linear(512 → 2) (supervised-baseline head, carried in the checkpoint; not used by the prototype protocol).
Lineage: DVF-trained backbone → 3-domain mixed SupCon (AEGIS + DVF + GenVideo) → projection-space prototype inference.

Files

File	Role
`supcon/final_best.pt`	Primary artifact — mixed-domain SupCon checkpoint (`run6`, best epoch 16).
`base/best_model.pt`	DVF-trained base checkpoint the backbone was transferred from (supervised baseline).
`config.json`	Architecture, preprocessing, training and eval-protocol metadata.
`load_model.py`	Minimal CPU loader + load-only smoke test.

Intended use / out of scope

Research artifact for studying cross-dataset transfer and retention in AI-generated video detection. It is evaluated under a prototype protocol (fixed real/fake prototypes built from a small labeled support bank, projection space) — not a deployed, calibrated binary classifier. It detects end-to-end AI-generated video; it is not a face-swap deepfake detector. Not intended for, and not validated for, content moderation, legal, or forensic decision-making.

How to load

The checkpoint is a dict with keys epoch, model_state_dict, optimizer_state_dict, best_selection_score, args; load model_state_dict. load_model.py builds the exact module and loads on CPU (strict).

from load_model import load_model, extract_projected_embedding

model = load_model("supcon/final_best.pt", map_location="cpu")  # eval mode
# Input clips: (B, 3, T=24, 224, 224), RGB, pixels in [0,1], NO Kinetics mean/std.
# emb = extract_projected_embedding(model, clips)   # L2-normalised 128-d

python load_model.py supcon/final_best.pt
# -> OK, 31629471 params

Note on base/best_model.pt: the base uses a Dropout(0.4) → Linear(512 → 2) head (fc.0/fc.1) and has no proj_head. load_model.py targets the SupCon checkpoint; the base is included only to document the transfer lineage.

Training data

Three datasets are used. Downstream users must independently comply with each dataset's terms and the terms of the underlying generators whose outputs appear in the data (e.g. Sora, KLing, Pika). This obligation is part of why the weights are released under a non-commercial license.

DVF (Diffusion Video Forensics) — from MM-Det (NeurIPS 2024). Paper: arXiv:2410.23623 · Code + dataset: github.com/SparkleXFantasy/MM-Det · HF: sparklexfantasy/DVF
GenVideo / GenVideo-100K — from DeMamba (the GenVideo-100K lightweight version was used). Paper: arXiv:2405.19707 · Code + dataset: github.com/chenhaoxing/DeMamba
AEGIS — Authenticity Evaluation Benchmark for AI-Generated Video Sequences (ACM MM 2025). Paper: arXiv:2508.10771 · HF: Clarifiedfish/AEGIS · ACM DL. Note: the AEGIS HF page does not expose a license tag; confirm its terms from the paper.

Per-domain split sizes used for this run (records): AEGIS total 436 (train 50 / val 50 / test 336); DVF total 1004 (200 / 200 / 604); GenVideo total 2971 (200 / 200 / 2571).

Training procedure

Base: R(2+1)D r2plus1d_18 trained from scratch as a supervised baseline on DVF (cross-entropy, class weights 1.0 / 1.5, label smoothing 0.2, Dropout(0.4) → Linear head).
Mixed-domain SupCon transfer (run6): initialise from the DVF backbone and fine-tune with supervised contrastive loss across all three domains.

Hyperparameter	Value
Loss	SupConLoss, temperature `0.07`
Optimizer	AdamW, lr `5e-5`, weight decay `1e-5`
Epochs	20 (best epoch 16; selected on validation)
Clip / fps / size	24 frames @ 24 fps, 224×224, pixels in `[0,1]`, no Kinetics norm
Batch size	24
Unfreeze policy	`layer4_all` + projection head
Projection	hidden 512, out 128
Domain loss weights	AEGIS 0.5 / DVF 0.3 / GenVideo 0.2
Batch sampling	generator-aware quota sampling

Evaluation

Protocol (author-reported). Fixed real/fake prototypes are built as the L2-normalised mean of a labeled support bank in projection space (10 support clips per class per domain, i.e. 10 real + 10 fake each for AEGIS / DVF / GenVideo). Each query clip is scored by score = sim_fake − sim_real and labeled fake when score ≥ 0.0. Produced by build_prototypes_from_support_jsonl.py + prototype_inference_from_saved_prototypes.py.

Dataset	AUROC	EER	Accuracy
AEGIS (target)	0.808	0.268	0.730
DVF (retain)	0.847	0.225	0.767
GenVideo (retain)	0.817	0.255	0.748

These are author-reported numbers from the fixed-prototype protocol above; they were not re-run in the environment that prepared this card (the support/query manifests and cached clips are not bundled). A separate in-training 20-trial averaged eval exists and is consistent in ranking.

Limitations

Dataset scope. Trained/evaluated on DVF, GenVideo, and AEGIS only; generalisation to unseen generators or post-processing is not guaranteed.
Prototype-protocol caveat. Support clips are drawn from labeled data of the same datasets, so this measures representation/adaptation quality, not unconditional deployment performance. The fixed score ≥ 0 threshold is not calibrated per deployment.
Projection-space separability. Real and fake prototypes have relatively high cosine similarity, which caps separation in the projection space.
Scope of detection. End-to-end AI-generated video only; not face-swap deepfakes.

Citation

Thesis (current source); a TVC paper will be added when published.

@misc{dvf_trained_transferred_aegis,
  title  = {TODO},
  author = {TODO},
  year   = {2026},
  note   = {TODO: thesis / TVC paper}
}

License

Weights released under CC BY-NC 4.0 (non-commercial). The training data includes proprietary generator outputs (Sora / KLing / Pika) and web-scraped real video, so a permissive commercial license is not appropriate. Use of this model must also respect the licenses and terms of the underlying datasets and generators.

Downloads last month: 42

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using jai1th/dvf_trained_transferred_aegis 1

Papers for jai1th/dvf_trained_transferred_aegis

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

Paper • 2508.10771 • Published Aug 14, 2025

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

Paper • 2410.23623 • Published Oct 31, 2024

DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30, 2024 • 9