viktor-enzell
/

wav2vec2-large-voxrex-swedish-4gram

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

viktor-enzell commited on Nov 19, 2022

Commit

efbaccd

•

1 Parent(s): 3cafa40

Update README.md

Files changed (1) hide show

README.md +22 -7

README.md CHANGED Viewed

@@ -1,10 +1,5 @@
 ---
 language: sv
-datasets:
-- common_voice
-- NST Swedish ASR Database
-- P4
-- The Swedish Culturomics Gigaword Corpus
 metrics:
 - wer
 tags:
@@ -14,6 +9,11 @@ tags:
 - hf-asr-leaderboard
 - sv
 license: cc0-1.0
 model-index:
 - name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
   results:
@@ -37,7 +37,22 @@ Training of the acoustic model is the work of KBLab. See [VoxRex-C](https://hugg
 VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
 ## How to use
-Example of transcribing 1% of the Common Voice test split, using GPU if available. The model expects 16kHz audio.
 ```python
 from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
@@ -45,7 +60,7 @@ from datasets import load_dataset
 import torch
 import torchaudio.functional as F
-# Import model and processor
 model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);

 ---
 language: sv
 metrics:
 - wer
 tags:
 - hf-asr-leaderboard
 - sv
 license: cc0-1.0
+datasets:
+- common_voice
+- NST_Swedish_ASR_Database
+- P4
+- The_Swedish_Culturomics_Gigaword_Corpus
 model-index:
 - name: Wav2vec 2.0 large VoxRex Swedish (C) with 4-gram
   results:
 VoxRex-C is extended with a 4-gram language model estimated from a subset extracted from [The Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/resurser/gigaword) from Språkbanken. The subset contains 40M words from the social media genre between 2010 and 2015.
 ## How to use
+#### Simple usage example with pipeline
+```python
+import torch
+from transformers import pipeline
+# Load the model. Using GPU if available
+model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+pipe = pipeline(model=model_name).to(device)
+# Run inference on an audio file
+output = pipe('path/to/audio.mp3')['text']
+```
+#### More verbose usage example with audio pre-processing
+Example of transcribing 1% of the Common Voice test split. The model expects 16kHz audio, so audio with another sampling rate is resampled to 16kHz.
 ```python
 from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
 import torch
 import torchaudio.functional as F
+# Import model and processor. Using GPU if available
 model_name = 'viktor-enzell/wav2vec2-large-voxrex-swedish-4gram'
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device);