OmniVAE Release Assets
This Hugging Face repository contains the release checkpoints, text encoders, evaluation data, and metric weights for the OmniVAE GitHub repository.
The public generation weights in this release are T2AV-only. VAE weights are included because they are required by the T2AV pipeline.
The source code is intentionally kept out of this repository. Use this bundle
with the GitHub code by setting OMNIVAE_RELEASE_ROOT to the local download
path.
Download
pip install -U huggingface_hub
huggingface-cli download zhanjun/OmniVAE \
--repo-type model \
--local-dir /path/to/omnivae_release \
--local-dir-use-symlinks False
git clone https://github.com/JunZhan2000/OmniVAE.git
cd OmniVAE
export OMNIVAE_RELEASE_ROOT=/path/to/omnivae_release
export OPEN_SOURCE_ROOT="${OMNIVAE_RELEASE_ROOT}"
Equivalently:
source scripts/setup_release_root.sh /path/to/omnivae_release
Layout
omnivae_release/
manifest.json
models/
text_encoder/ frozen Qwen3.5 text encoder used by T2AV inference
vae/ OmniVAE checkpoints
dit/t2av/ T2AV generation checkpoints
eval/
data/t2av/ validation manifests and small reference data
models/t2av/ metric weights for T2AV evaluation
All config and metadata paths are relative to the repository root, so the code can load this directory from any local path.
Text Encoders
The release includes the frozen Qwen3.5 text encoder directory used by the validated T2AV configs:
models/text_encoder/Qwen3.5-0.8B-Base
It is a third-party dependency weight, not an additional OmniVAE checkpoint. Keeping it preserves the exact inference/evaluation settings used for the released results.
Checkpoints
VAE
models/vae/video_only/reconmodels/vae/video_only/recon_distillmodels/vae/audio_only/reconmodels/vae/audio_only/recon_distillmodels/vae/audio_video/recon_avclipmodels/vae/audio_video/recon_distill_avclipmodels/vae/audio_only/recon_avclip_ft_decodermodels/vae/audio_only/recon_distill_avclip_ft_decoder
For T2AV AVCLIP experiment families, the audio branch uses the
*_ft_decoder checkpoint by default.
Generation DiT
The public bundle contains T2AV generation packages only:
t2av: t2av_recon, t2av_recon_distill, t2av_recon_avclip, t2av_recon_distill_avclip
Each package is inference-only and excludes optimizer, scheduler, dataloader, random state, and trainer state snapshots.
Evaluation Assets
eval/data/t2av/versebench_minimal: VerseBench validation manifest used by the release smoke and comparison scripts.eval/models/t2av: T2AV metric weights for the bundledmy_evalpipeline.eval/models/vae/fvd/i3d_torchscript.pt: rFVD metric weight for VAE video reconstruction evaluation.
Quick Validation
cd /path/to/OmniVAE
export OMNIVAE_RELEASE_ROOT=/path/to/omnivae_release
# T2AV release validation on VerseBench set3-large.
bash scripts/release_launchers/run_t2av_release_compare_distributed.sh
See the GitHub repository README.md and docs/ for full installation,
training, inference, and evaluation commands.
License
The OmniVAE release assets are provided under the Apache-2.0 license. Some metric packages and third-party components retain their original licenses; see the corresponding source-code subdirectories in the GitHub repository.