bob80333 commited on
Commit
b8fdc3a
1 Parent(s): a401762

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail:
4
+ tags:
5
+ - speech-translation
6
+ - CTC
7
+ - Attention
8
+ - Transformer
9
+ - pytorch
10
+ - speechbrain
11
+ metrics:
12
+ - BLEU
13
+ ---
14
+
15
+ # Conformer Encoder/Decoder for Speech Translation
16
+
17
+ This model was trained with [SpeechBrain](https://speechbrain.github.io), and is based on the Fisher Callhome recipie.
18
+ The performance of the model is the following:
19
+
20
+ | Release | CoVoSTv2 JA->EN Test BLEU | Custom Dataset Validation BLEU | Custom Dataset Test BLEU | GPUs |
21
+ |:-------------:|:--------------:|:--------------:|:--------------:|:--------:|
22
+ | 01-13-21 | 9.73 | 8.38 | 12.01 | 1xRTX 3090 |
23
+
24
+
25
+ This model was trained on subtitled audio downloaded from YouTube, and was not fine-tuned on the CoVoSTv2 training set.
26
+ When calculating the BLEU score for CoVoSTv2, the utterances were first preprocessed by the same pipeline that preprocessed the original data for the model, which includes removing all punctuation outside of apostrophes, and removing capitalization, similar to the data preprocessing done for the Fisher Callhome dataset in the speechbrain recipe.
27
+ ## Pipeline description
28
+
29
+ The system is trained with recordings sampled at 16kHz (single channel).
30
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
31
+
32
+ ## Install SpeechBrain
33
+
34
+ First of all, install SpeechBrain with the following command:
35
+
36
+ ```
37
+ pip install speechbrain
38
+ ```
39
+
40
+ ### Transcribing your own audio files (Spoken Japanese, to written English)
41
+
42
+ ```python
43
+ from speechbrain.pretrained import EncoderDecoderASR
44
+ st_model = EncoderDecoderASR.from_hparams(source="bob80333/speechbrain_ja2en_st_63M_yt600h")
45
+ st_model.transcribe_file("your_file_here.wav")
46
+ ```
47
+ ### Inference on GPU
48
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
49
+ model.
50
+
51
+ ### Limitations:
52
+ The model is likely to get caught in repetitions. The model is not very good at translation, which is reflected by its low BLEU scores.
53
+ The outputs of this model are unlikely to be correct, do not rely on it for any serious purpose.
54
+ This model was trained on data from Youtube, and has inherited whatever biases can be found in Youtube audio/subtitles.
55
+ The creator of this model doesn't actually know Japanese.