johntsi commited on
Commit
9133100
1 Parent(s): c312f6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -245,6 +245,8 @@ tags:
245
 
246
  ZeroSwot is a state-of-the-art zero-shot end-to-end Speech Translation system.
247
 
 
 
248
  The model is created by adapting a wav2vec2.0-based encoder to the embedding space of NLLB, using a novel subword compression module and Optimal Transport, while only utilizing ASR data. It thus enables **Zero-shot E2E Speech Translation to all the 200 languages supported by NLLB**.
249
 
250
  For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) and the [original repo](https://github.com/mt-upc/ZeroSwot) build on fairseq.
@@ -253,7 +255,7 @@ For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) a
253
 
254
  The compression module is a light-weight transformer that takes as input the hidden state of wav2vec2.0 and the corresponding CTC predictions, and compresses them to subword-like embeddings similar to those expected from NLLB and aligns them using Optimal Transport. For inference we simply pass the output of the speech encoder to NLLB encoder.
255
 
256
- <div align=center><img src="resources/methodology.png" height="100%" width="100%"/></div>
257
 
258
  ## Version
259
 
 
245
 
246
  ZeroSwot is a state-of-the-art zero-shot end-to-end Speech Translation system.
247
 
248
+ <div align=center><img src="resources/intro.png" height="75%" width="75%"/></div>
249
+
250
  The model is created by adapting a wav2vec2.0-based encoder to the embedding space of NLLB, using a novel subword compression module and Optimal Transport, while only utilizing ASR data. It thus enables **Zero-shot E2E Speech Translation to all the 200 languages supported by NLLB**.
251
 
252
  For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) and the [original repo](https://github.com/mt-upc/ZeroSwot) build on fairseq.
 
255
 
256
  The compression module is a light-weight transformer that takes as input the hidden state of wav2vec2.0 and the corresponding CTC predictions, and compresses them to subword-like embeddings similar to those expected from NLLB and aligns them using Optimal Transport. For inference we simply pass the output of the speech encoder to NLLB encoder.
257
 
258
+ <div align=center><img src="resources/methodology.png" height="120%" width="120%"/></div>
259
 
260
  ## Version
261