johntsi
/

ZeroSwot-Medium_asr-cv_en-to-200

Automatic Speech Recognition

zero_swot_encoder

feature-extraction

speech translation

Model card Files Files and versions Community

johntsi commited on Jun 25

Commit

eafabee

•

1 Parent(s): 487e674

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -251,7 +251,7 @@ For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) a
 This version of ZeroSwot is trained with ASR data from CommonVoice, and adapting [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
-<div align=center><img src="methodology.png" height="100%" width="100%"/></div>
 ## Usage
@@ -284,7 +284,7 @@ nllb_model.eval()
 nllb_model.to("cuda")
 # Load sample .wav
-audio = load_and_resample_audio("sample.wav")
 input_values = processor(audio, sampling_rate=16000, return_tensors="pt").cuda()
 # translation to German

 This version of ZeroSwot is trained with ASR data from CommonVoice, and adapting [wav2vec2.0-large](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self) to the [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model.
+<div align=center><img src="resources/methodology.png" height="100%" width="100%"/></div>
 ## Usage
 nllb_model.to("cuda")
 # Load sample .wav
+audio = load_and_resample_audio("resources/sample.wav")
 input_values = processor(audio, sampling_rate=16000, return_tensors="pt").cuda()
 # translation to German