You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Real-World AVSE Challenge (ISCSLP 2026) โ€” Baseline (track2)

AV-ConvTasNet baseline checkpoint for the Real-World AVSE Challenge, saved with PyTorchModelHubMixin (config.json + model.safetensors). The trained video encoder is bundled in the weights, so no separate lip-reading backbone is needed.

Usage

from look2hear.models import AV_ConvTasNet
model = AV_ConvTasNet.from_pretrained("JusperLee/Real-World-AVSE-Baseline-Track2").eval()

Or directly in the challenge evaluation:

python eval_real.py --ckpt JusperLee/Real-World-AVSE-Baseline-Track2 \
  --track track2 --split dev --metrics all --mode both --save_dir enhanced_out

This repo is private: log in (huggingface-cli login or HF_TOKEN) and be granted access before downloading.

See the baseline repository for the model definition and full pipeline.

Downloads last month
7
Safetensors
Model size
25M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support