File size: 359 Bytes
59fb133
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
---
license: apache-2.0
---

The AV_MossFormer2_TSE_16K model weights for 16 kHz audio-visual target speaker extraction in [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio/tree/main) repo.

This model is trained on large scale open-sourced datasets.

It extracts each speaker's voice from a multi-speaker video using facial recognition.