jlehecka commited on
Commit
04cf6e2
·
verified ·
1 Parent(s): c3710f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "cs"
3
+ tags:
4
+ - Czech
5
+ - KKY
6
+ - FAV
7
+ license: "cc-by-nc-sa-4.0"
8
+ ---
9
+
10
+ # wav2vec2-base-cs-50k
11
+ This is a monolingual Czech Wav2Vec 2.0 base model pre-trained from 50 thousand hours of Czech speech.
12
+
13
+ This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data.
14
+
15
+ ## Speech recognition results
16
+ After fine-tuning, the model scored the following results on public datasets:
17
+ - Czech portion of CommonVoice v16.0: **WER = 11.36%**
18
+
19
+ See our paper for details.
20
+
21
+ ## Paper
22
+ The preprint of our paper (accepted to INTERSPEECH 2024) is available at [tbd]
23
+
24
+ ### All models released within the paper
25
+ - https://huggingface.co/fav-kky/wav2vec2-base-cs-50k (monolingual Czech)
26
+ - https://huggingface.co/fav-kky/wav2vec2-base-de-50k (monolingual German)
27
+ - https://huggingface.co/fav-kky/wav2vec2-base-cs-en-100k (bilingual Czech+English)
28
+ - https://huggingface.co/fav-kky/wav2vec2-base-cs-de-100k (bilingual Czech+English)
29
+ - https://huggingface.co/fav-kky/wav2vec2-base-en-de-100k (bilingual English+German)
30
+ - https://huggingface.co/fav-kky/wav2vec2-base-cs-en-de-150k (trilingual Czech+English+German)
31
+
32
+ ## Citation
33
+ If you find this model useful, please cite our paper:
34
+ ```
35
+ tbd
36
+ ```
37
+
38
+ ## Usage
39
+ Inputs must be 16kHz mono audio files.
40
+
41
+ This model can be used e.g. to extract per-frame contextual embeddings from audio:
42
+ ```python
43
+ from transformers import Wav2Vec2Model, Wav2Vec2FeatureExtractor
44
+ import torchaudio
45
+
46
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("fav-kky/wav2vec2-base-cs-80k-ClTRUS")
47
+ model = Wav2Vec2Model.from_pretrained("fav-kky/wav2vec2-base-cs-50k")
48
+
49
+ speech_array, sampling_rate = torchaudio.load("/path/to/audio/file.wav")
50
+ inputs = feature_extractor(
51
+ speech_array,
52
+ sampling_rate=16_000,
53
+ return_tensors="pt"
54
+ )["input_values"][0]
55
+
56
+ output = model(inputs)
57
+ embeddings = output.last_hidden_state.detach().numpy()[0]
58
+ ```
59
+
60
+ ## Related works