viktor-enzell commited on
Commit
efbaccd
1 Parent(s): 3cafa40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -7
README.md CHANGED
@@ -1,10 +1,5 @@
1
  ---
2
  language: sv
3
- datasets:
4
- - common_voice
5
- - NST Swedish ASR Database
6
- - P4
7
- - The Swedish Culturomics Gigaword Corpus
8
  metrics:
9
  - wer
10
  tags:
@@ -14,6 +9,11 @@ tags:
14
  - hf-asr-leaderboard
15
  - sv
16
  license: cc0-1.0
 
 
 
 
 
17
  model-index:
18
  - name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
19
  results:
@@ -37,7 +37,22 @@ Training of the acoustic model is the work of KBLab. See [VoxRex-C](https://hugg
37
  VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
38
 
39
  ## How to use
40
- Example of transcribing 1% of the Common Voice test split, using GPU if available. The model expects 16kHz audio.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ```python
43
  from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
@@ -45,7 +60,7 @@ from datasets import load_dataset
45
  import torch
46
  import torchaudio.functional as F
47
 
48
- # Import model and processor
49
  model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
50
  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
51
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);
1
  ---
2
  language: sv
 
 
 
 
 
3
  metrics:
4
  - wer
5
  tags:
9
  - hf-asr-leaderboard
10
  - sv
11
  license: cc0-1.0
12
+ datasets:
13
+ - common_voice
14
+ - NST_Swedish_ASR_Database
15
+ - P4
16
+ - The_Swedish_Culturomics_Gigaword_Corpus
17
  model-index:
18
  - name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
19
  results:
37
  VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
38
 
39
  ## How to use
40
+ #### Simple usage example with pipeline
41
+ ```python
42
+ import torch
43
+ from transformers import pipeline
44
+
45
+ # Load the model. Using GPU if available
46
+ model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
47
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
48
+ pipe = pipeline(model=model_name).to(device)
49
+
50
+ # Run inference on an audio file
51
+ output = pipe('path/to/audio.mp3')['text']
52
+ ```
53
+
54
+ #### More verbose usage example with audio pre-processing
55
+ Example of transcribing 1% of the Common Voice test split. The model expects 16kHz audio, so audio with another sampling rate is resampled to 16kHz.
56
 
57
  ```python
58
  from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
60
  import torch
61
  import torchaudio.functional as F
62
 
63
+ # Import model and processor. Using GPU if available
64
  model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
65
  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
66
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);