vumichien commited on
Commit
3d63ba1
1 Parent(s): 4cb31e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -15,6 +15,12 @@ tags:
15
 
16
  These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
17
 
 
 
 
 
 
 
18
  Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
19
  movements and the produced sound.
20
 
 
15
 
16
  These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
17
 
18
+ <figure>
19
+ <img src="https://huggingface.co/vumichien/AV-HuBERT/blob/main/HuBert.png" alt="Audio-visual HuBERT">
20
+ <figcaption>Audio-visual HuBERT
21
+ </figcaption>
22
+ </figure>
23
+
24
  Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
25
  movements and the produced sound.
26