AV_MossFormer2_TSE_16K / README.md

Update README.md

59fb133 verified about 1 month ago

359 Bytes

metadata

license: apache-2.0

The AV_MossFormer2_TSE_16K model weights for 16 kHz audio-visual target speaker extraction in ClearerVoice-Studio repo.

This model is trained on large scale open-sourced datasets.

It extracts each speaker's voice from a multi-speaker video using facial recognition.