vumichien commited on
Commit
833e23b
1 Parent(s): 9139449

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -16,8 +16,9 @@ tags:
16
  These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
17
 
18
  Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
19
- movements and the produced sound. Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for
20
- audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
 
21
  learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
22
 
23
  ## Datasets
 
16
  These are model weights originally provided by the authors of the paper [Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/pdf/2201.02184.pdf).
17
 
18
  Video recordings of speech contain correlated audio and visual information, providing a strong signal for speech representation learning from the speaker’s lip
19
+ movements and the produced sound.
20
+
21
+ Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised representation learning framework for audio-visual speech, which masks multi-stream video input and predicts automatically discovered and iteratively refined multimodal hidden units. AV-HuBERT
22
  learns powerful audio-visual speech representation benefiting both lip-reading and automatic speech recognition.
23
 
24
  ## Datasets