alexmourachko commited on
Commit
a0f0e17
1 Parent(s): 2209e43

update readme to match github

Browse files
Files changed (1) hide show
  1. README.md +49 -47
README.md CHANGED
@@ -3,26 +3,30 @@ license: cc-by-nc-4.0
3
  ---
4
 
5
  # SONAR
6
- [[Paper]]()
7
  [[Demo]](#usage)
8
 
9
- We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space. Our single **text encoder, covering 200 languages**, substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.
10
 
11
- Speech segments can be embedded in the same \sonar embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. Our encoders outperform existing speech encoders on similarity search tasks.
12
- We also provide a **text decoder for 200 languages**, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
13
 
14
- Our text-to-text results are competitive compared to the state-of-the-art NLLB~1B model, despite the fixed-size bottleneck representation. Our zero-shot speech-to-text translation results compare favorably with strong supervised baselines such as Whisper.
15
 
16
-
17
- Model inference support thanks [Fairseq2](https://github.com/facebookresearch/fairseq2)
18
 
19
 
20
  ## Installing
21
-
22
- See our github [repo](https://reimagined-broccoli-941276ee.pages.github.io/nightly/installation/from_source_conda)
 
 
 
 
23
 
24
  ## Usage
25
- Compute text sentence embeddings:
 
 
26
  ```python
27
  from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
28
  t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
@@ -32,7 +36,7 @@ t2vec_model.predict(sentences, source_lang="eng_Latn").shape
32
  # torch.Size([2, 1024])
33
  ```
34
 
35
- Translate with SONAR
36
  ```python
37
  from sonar.inference_pipelines.text import TextToTextModelPipeline
38
  t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
@@ -44,50 +48,47 @@ t2t_model.predict(sentences, source_lang="eng_Latn", target_lang="fra_Latn")
44
  # ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
45
  ```
46
 
47
- Compute speech sentence embeddings:
48
  ```python
49
- import torch
50
- from sonar.inference_pipelines.speech import SpeechToEmbeddingPipeline, SpeechInferenceParams
51
-
52
- speech_embedding_dp_builder = SpeechToEmbeddingPipeline.load_from_name("sonar_speech_encoder_eng")
53
 
54
- speech_ctx = SpeechInferenceParams(
55
- data_file="..../test_fleurs_fra-eng.tsv",
56
- audio_root_dir=".../audio_zips",
57
- audio_path_index=2,
58
- batch_size=4,
59
- )
60
 
61
- speech_embedding_dp = speech_embedding_dp_builder.build_pipeline(speech_ctx)
62
- with torch.inference_mode():
63
- speech_emb = next(iter(speech_embedding_dp))
64
- speech_emb["audio"]["data"].sentence_embeddings
65
  ```
66
 
67
-
68
- Speech-to-text with SONAR
69
  ```python
70
- import torch
71
- from sonar.inference_pipelines import SpeechToTextPipeline, SpeechInferenceParams
72
-
73
- speech_to_text_dp_builder = SpeechToTextPipeline.load_from_name(encoder_name="sonar_speech_encoder_eng",
74
- decoder_name="text_sonar_basic_decoder")
75
-
76
- speech_ctx = SpeechInferenceParams(
77
- data_file=".../test_fleurs_fra-eng.tsv",
78
- audio_root_dir=".../audio_zips",
79
- audio_path_index=2,
80
- target_lang='fra_Latn',
81
- batch_size=4,
82
- )
83
- speech_to_text_dp = speech_to_text_dp_builder.build_pipeline(speech_ctx)
84
- with torch.inference_mode():
85
- speech_text_translation = next(iter(speech_to_text_dp))
86
- speech_text_translation
 
 
87
  ```
88
 
89
- Predicting [cross-lingual semantic similarity](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/human_XSTS_eval)
90
- with BLASER-2 models
91
  ```Python
92
  import torch
93
  from sonar.models.blaser.loader import load_blaser_model
@@ -102,6 +103,7 @@ print(blaser_qe(src=emb, mt=emb).item()) # 4.9819
102
  ```
103
 
104
  See more complete demo notebooks :
 
105
  * [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
106
  * [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
107
 
 
3
  ---
4
 
5
  # SONAR
6
+ [[Paper]](https://fb.workplace.com/groups/831302610278251/permalink/9713798772028546) (TODO: change for external link once published)
7
  [[Demo]](#usage)
8
 
9
+ We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. It substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks.
10
 
11
+ Speech segments can be embedded in the same SONAR embedding space using language-specific speech encoders trained in a teacher-student setting on speech transcription data. We also provide a single text decoder, which allows us to perform text-to-text and speech-to-text machine translation, including for zero-shot language and modality combinations.
 
12
 
13
+ *SONAR* stands for **S**entence-level multim**O**dal and la**N**guage-**A**gnostic **R**epresentations
14
 
15
+ The full list of supported languages (along with download links) can be found here [below](#supported-languages-and-download-links).
 
16
 
17
 
18
  ## Installing
19
+ SONAR depends mainly on [Fairseq2](https://github.com/fairinternal/fairseq2) and can be installed using (tested with `python=3.8`)
20
+ ```bash
21
+ pip install --upgrade pip
22
+ pip config set global.extra-index-url https://test.pypi.org/simple/
23
+ pip install -e .
24
+ ```
25
 
26
  ## Usage
27
+ fairseq2 will automatically download models into your `$TORCH_HOME/hub` directory upon using the commands below.
28
+
29
+ ### Compute text sentence embeddings with SONAR:
30
  ```python
31
  from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
32
  t2vec_model = TextToEmbeddingModelPipeline(encoder="text_sonar_basic_encoder",
 
36
  # torch.Size([2, 1024])
37
  ```
38
 
39
+ ### Translate text with SONAR
40
  ```python
41
  from sonar.inference_pipelines.text import TextToTextModelPipeline
42
  t2t_model = TextToTextModelPipeline(encoder="text_sonar_basic_encoder",
 
48
  # ['Mon nom est SONAR.', "Je peux intégrer les phrases dans l'espace vectoriel."]
49
  ```
50
 
51
+ ### Compute speech sentence embeddings with SONAR
52
  ```python
53
+ from sonar.inference_pipelines.speech import SpeechToEmbeddingModelPipeline
54
+ s2vec_model = SpeechToEmbeddingModelPipeline(encoder="sonar_speech_encoder_eng")
 
 
55
 
56
+ s2vec_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
57
+ "./tests/integration_tests/data/audio_files/audio_2.wav"]).shape
58
+ # torch.Size([2, 1024])
59
+ import torchaudio
60
+ inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
61
+ assert sr == 16000, "Sample rate should be 16kHz"
62
 
63
+ s2vec_model.predict([inp]).shape
64
+ # torch.Size([1, 1024])
 
 
65
  ```
66
 
67
+ ### Speech-to-text translation with SONAR
 
68
  ```python
69
+ from sonar.inference_pipelines.speech import SpeechToTextModelPipeline
70
+
71
+ s2t_model = SpeechToTextModelPipeline(encoder="sonar_speech_encoder_eng",
72
+ decoder="text_sonar_basic_decoder",
73
+ tokenizer="text_sonar_basic_decoder")
74
+
75
+ import torchaudio
76
+ inp, sr = torchaudio.load("./tests/integration_tests/data/audio_files/audio_1.wav")
77
+ assert sr == 16000, "Sample rate should be 16kHz"
78
+
79
+ # passing loaded audio files
80
+ s2t_model.predict([inp], target_lang="eng_Latn")
81
+ # ['Television reports show white smoke coming from the plant.']
82
+
83
+ # passing multiple wav files
84
+ s2t_model.predict(["./tests/integration_tests/data/audio_files/audio_1.wav",
85
+ "./tests/integration_tests/data/audio_files/audio_2.wav"], target_lang="eng_Latn")
86
+ # ['Television reports show white smoke coming from the plant.',
87
+ # 'These couples may choose to make an adoption plan for their baby.']
88
  ```
89
 
90
+
91
+ ### Predicting [cross-lingual semantic similarity](https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/human_XSTS_eval) with BLASER 2 models
92
  ```Python
93
  import torch
94
  from sonar.models.blaser.loader import load_blaser_model
 
103
  ```
104
 
105
  See more complete demo notebooks :
106
+
107
  * [sonar text2text similarity and translation](examples/sonar_text_demo.ipynb)
108
  * [sonar speech2text and other data pipeline examples](examples/inference_pipelines.ipynb)
109