pere commited on
Commit
8c74e87
1 Parent(s): e1f27a9

updated template

Browse files
Files changed (1) hide show
  1. README.md +25 -28
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
  - NbAiLab/ncc_speech
10
  - NbAiLab/NST
11
  - NbAiLab/NPSC
12
- base_model: openai/whisper-tiny
13
  tags:
14
  - audio
15
  - asr
@@ -28,35 +28,32 @@ widget:
28
  ---
29
 
30
 
31
- # NB-Whisper Tiny (Release Candidate)
32
 
33
- **IMPORTANT:** These models are currently Release Candidates. We are in the final stages of testing. If everything proceeds smoothly, we plan to officially release the models later this month.
34
-
35
- Introducing the **_Norwegian NB-Whisper Tiny model_**, proudly developed by the National Library of Norway. NB-Whisper is a cutting-edge series of models designed for automatic speech recognition (ASR) and speech translation. These models are based on the work of [OpenAI's Whisper](https://arxiv.org/abs/2212.04356). Each model in the series has been trained for 250,000 steps, utilizing a diverse dataset of 8 million samples. These samples consist of aligned audio clips, each 30 seconds long, culminating in a staggering 66,000 hours of speech. For an in-depth understanding of our training methodology and dataset composition, keep an eye out for our upcoming article.
36
 
37
  | Model Size | Parameters | Model |
38
  |------------|------------|------------|
39
- | Tiny | 39M | [NB-Whisper Tiny](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny) |
40
- | Base | 74M | [NB-Whisper Base](https://huggingface.co/NbAiLabBeta/nb-whisper-base) |
41
- | Small | 244M | [NB-Whisper Small](https://huggingface.co/NbAiLabBeta/nb-whisper-small) |
42
- | Medium | 769M | [NB-Whisper Medium](https://huggingface.co/NbAiLabBeta/nb-whisper-medium) |
43
- | Large | 1550M | [NB-Whisper Large](https://huggingface.co/NbAiLabBeta/nb-whisper-large) |
44
 
45
 
46
 
47
- ### Specialised Models
48
  While the main models are suitable for most transcription task, we demonstrate how easy it is to change the output of the main model. The following models are trained 250 additional steps from the main models above, and might be suitable for more targetted use cases:
49
  - **Verbatim version**: This lower-cased variant is more literal and suitable for tasks requiring detailed transcription, such as linguistic analysis.
50
- - **Semantic version**: This variant focuses less on verbatim accuracy but captures the essence of content, ideal for meeting minutes and subtitling.
51
 
52
 
53
- | Model Size | Parameters | Verbatim version | Semantic version |
54
- |------------|------------|------------|------------------|
55
- | Tiny | 39M | [Tiny - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-verbatim) | [Tiny - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-semantic) |
56
- | Base | 74M | [Base - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-base-verbatim) | [Base - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-base-semantic) |
57
- | Small | 244M | [Small - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-small-verbatim) | [Small - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-small-semantic) |
58
- | Medium | 769M | [Medium - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-verbatim) | [Medium - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-semantic) |
59
- | Large | 1550M | [Large - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-large-verbatim) | [Large - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-large-semantic) |
60
 
61
 
62
  ### Model Description
@@ -66,7 +63,7 @@ While the main models are suitable for most transcription task, we demonstrate h
66
  - **Model type:** `whisper`
67
  - **Language(s) (NLP):** Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
68
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
69
- - **Trained from model:** [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
70
  - **Code Repository:** https://github.com/NbAiLab/nb-whisper/
71
  - **Paper:** _Coming soon_
72
  - **Demo:** _See Spaces on this page_
@@ -75,7 +72,7 @@ While the main models are suitable for most transcription task, we demonstrate h
75
  ## How to Use the Models
76
 
77
  ### Online Demos
78
- You can try the models directly through the HuggingFace Inference API, accessible on the right side of this page. Be aware that initially, the model needs to load and will run on limited CPU capacity, which might be slow. To enhance your experience, we are temporarily hosting some models on TPUs for a few days, significantly boosting their performance. Explore these under the **Spaces** section on the [Main Page](https://huggingface.co/NbAiLabBeta/).
79
 
80
  ### Local Setup with HuggingFace
81
  Alternatively, you can run the models locally. The Tiny, Base, and Small models are optimized for CPU execution. For the Medium and Large models, we recommend a system equipped with a GPU to ensure efficient processing. Setting up and using these models with HuggingFace's Transformers is straightforward, provided you have [Python](https://www.python.org/downloads/) installed on your machine. For practical demonstrations, refer to examples using this [sample mp3 file](https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3).
@@ -94,7 +91,7 @@ After this is done, you should be able to run this in Python:
94
  from transformers import pipeline
95
 
96
  # Load the model
97
- asr = pipeline("automatic-speech-recognition", "NbAiLabBeta/nb-whisper-tiny")
98
 
99
  #transcribe
100
  asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
@@ -223,14 +220,14 @@ $ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
223
  $ ffmpeg -i king.mp3 -ar 16000 -ac 1 -c:a pcm_s16le king.wav
224
 
225
  # Lets download the two ggml-files from this site
226
- wget -N https://huggingface.co/NbAiLabBeta/nb-whisper-tiny/resolve/main/ggml-model.bin -O models/nb-tiny-ggml-model.bin
227
- wget -N https://huggingface.co/NbAiLabBeta/nb-whisper-tiny/resolve/main/ggml-model-q5_0.bin -O models/nb-tiny-ggml-model-q5_0.bin
228
 
229
  # And run it with the f16 default model
230
- $ ./main -l no -m models/nb-tiny-ggml-model.bin king.wav
231
 
232
  # Or the quantized version
233
- $ ./main -l no -m models/nb-tiny-ggml-model-q5_0.bin king.wav
234
  ```
235
 
236
  ### WhisperX and Speaker Diarization
@@ -250,7 +247,7 @@ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/knuthamsun.mp3
250
  pip uninstall whisperx && pip install git+https://github.com/m-bain/whisperx.git@8540ff5985fceee764acbed94f656063d7f56540
251
 
252
  # Transcribe the test file. All transcripts will end up in the directory of the mp3-file
253
- whisperx knuthamsun.mp3 --model NbAiLabBeta/nb-whisper-tiny --language no --diarize
254
 
255
  ```
256
 
@@ -282,7 +279,7 @@ Using these models without adequate risk assessment and mitigation could be cons
282
  The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository [nb-whisper](https://github.com/NbAiLab/nb-whisper/).
283
 
284
  ## Citation & Contributors
285
- The NB-Whisper Tiny model is a product of the NoSTram project led by Per Egil Kummervold ([@pere](https://huggingface.co/pere)) at the National Library of Norway. Key contributors include Javier de la Rosa ([@versae](https://huggingface.co/versae)), Freddy Wetjen ([@freddyw](https://huggingface.co/freddyw)), and Rolv-Arild Braaten ([@Rolv-Arild](https://huggingface.co/Rolv-Arild)). NB AI-Lab, under the direction of Svein Arne Brygfjeld ([@Brygfjeld](https://huggingface.co/Brygfjeld)), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
286
 
287
  ## Disclaimer
288
 
 
9
  - NbAiLab/ncc_speech
10
  - NbAiLab/NST
11
  - NbAiLab/NPSC
12
+ base_model: openai/whisper-small
13
  tags:
14
  - audio
15
  - asr
 
28
  ---
29
 
30
 
31
+ # NB-Whisper Small
32
 
33
+ Introducing the **_Norwegian NB-Whisper Small model_**, proudly developed by the National Library of Norway. NB-Whisper is a cutting-edge series of models designed for automatic speech recognition (ASR) and speech translation. These models are based on the work of [OpenAI's Whisper](https://arxiv.org/abs/2212.04356). Each model in the series has been trained for 250,000 steps, utilizing a diverse dataset of 8 million samples. These samples consist of aligned audio clips, each 30 seconds long, culminating in a staggering 66,000 hours of speech. For an in-depth understanding of our training methodology and dataset composition, keep an eye out for our upcoming article.
 
 
34
 
35
  | Model Size | Parameters | Model |
36
  |------------|------------|------------|
37
+ | Tiny | 39M | [NB-Whisper Tiny](https://huggingface.co/NbAiLab/nb-whisper-tiny) |
38
+ | Base | 74M | [NB-Whisper Base](https://huggingface.co/NbAiLab/nb-whisper-base) |
39
+ | Small | 244M | [NB-Whisper Small](https://huggingface.co/NbAiLab/nb-whisper-small) |
40
+ | Medium | 769M | [NB-Whisper Medium](https://huggingface.co/NbAiLab/nb-whisper-medium) |
41
+ | Large | 1550M | [NB-Whisper Large](https://huggingface.co/NbAiLab/nb-whisper-large) |
42
 
43
 
44
 
45
+ ### Verbatim Model
46
  While the main models are suitable for most transcription task, we demonstrate how easy it is to change the output of the main model. The following models are trained 250 additional steps from the main models above, and might be suitable for more targetted use cases:
47
  - **Verbatim version**: This lower-cased variant is more literal and suitable for tasks requiring detailed transcription, such as linguistic analysis.
 
48
 
49
 
50
+ | Model Size | Parameters | Semantic version |
51
+ |------------|------------|------------------|
52
+ | Tiny | 39M | [Tiny - semantic](https://huggingface.co/NbAiLab/nb-whisper-tiny-semantic) |
53
+ | Base | 74M | [Base - semantic](https://huggingface.co/NbAiLab/nb-whisper-base-semantic) |
54
+ | Small | 244M | [Small - semantic](https://huggingface.co/NbAiLab/nb-whisper-small-semantic) |
55
+ | Medium | 769M | [Medium - semantic](https://huggingface.co/NbAiLab/nb-whisper-medium-semantic) |
56
+ | Large | 1550M | [Large - semantic](https://huggingface.co/NbAiLab/nb-whisper-large-semantic) |
57
 
58
 
59
  ### Model Description
 
63
  - **Model type:** `whisper`
64
  - **Language(s) (NLP):** Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
65
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
66
+ - **Trained from model:** [openai/whisper-small](https://huggingface.co/openai/whisper-small)
67
  - **Code Repository:** https://github.com/NbAiLab/nb-whisper/
68
  - **Paper:** _Coming soon_
69
  - **Demo:** _See Spaces on this page_
 
72
  ## How to Use the Models
73
 
74
  ### Online Demos
75
+ You can try the models directly through the HuggingFace Inference API, accessible on the right side of this page. Be aware that initially, the model needs to load and will run on limited CPU capacity, which might be slow. To enhance your experience, we are temporarily hosting some models on TPUs for a few days, significantly boosting their performance. Explore these under the **Spaces** section on the [Main Page](https://huggingface.co/NbAiLab/).
76
 
77
  ### Local Setup with HuggingFace
78
  Alternatively, you can run the models locally. The Tiny, Base, and Small models are optimized for CPU execution. For the Medium and Large models, we recommend a system equipped with a GPU to ensure efficient processing. Setting up and using these models with HuggingFace's Transformers is straightforward, provided you have [Python](https://www.python.org/downloads/) installed on your machine. For practical demonstrations, refer to examples using this [sample mp3 file](https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3).
 
91
  from transformers import pipeline
92
 
93
  # Load the model
94
+ asr = pipeline("automatic-speech-recognition", "NbAiLabBeta/nb-whisper-small")
95
 
96
  #transcribe
97
  asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
 
220
  $ ffmpeg -i king.mp3 -ar 16000 -ac 1 -c:a pcm_s16le king.wav
221
 
222
  # Lets download the two ggml-files from this site
223
+ wget -N https://huggingface.co/NbAiLab/nb-whisper-small/resolve/main/ggml-model.bin -O models/nb-small-ggml-model.bin
224
+ wget -N https://huggingface.co/NbAiLab/nb-whisper-small/resolve/main/ggml-model-q5_0.bin -O models/nb-small-ggml-model-q5_0.bin
225
 
226
  # And run it with the f16 default model
227
+ $ ./main -l no -m models/nb-small-ggml-model.bin king.wav
228
 
229
  # Or the quantized version
230
+ $ ./main -l no -m models/nb-small-ggml-model-q5_0.bin king.wav
231
  ```
232
 
233
  ### WhisperX and Speaker Diarization
 
247
  pip uninstall whisperx && pip install git+https://github.com/m-bain/whisperx.git@8540ff5985fceee764acbed94f656063d7f56540
248
 
249
  # Transcribe the test file. All transcripts will end up in the directory of the mp3-file
250
+ whisperx knuthamsun.mp3 --model NbAiLabBeta/nb-whisper-small --language no --diarize
251
 
252
  ```
253
 
 
279
  The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository [nb-whisper](https://github.com/NbAiLab/nb-whisper/).
280
 
281
  ## Citation & Contributors
282
+ The NB-Whisper Small model is a product of the NoSTram project led by Per Egil Kummervold ([@pere](https://huggingface.co/pere)) at the National Library of Norway. Key contributors include Javier de la Rosa ([@versae](https://huggingface.co/versae)), Freddy Wetjen ([@freddyw](https://huggingface.co/freddyw)), and Rolv-Arild Braaten ([@Rolv-Arild](https://huggingface.co/Rolv-Arild)). NB AI-Lab, under the direction of Svein Arne Brygfjeld ([@Brygfjeld](https://huggingface.co/Brygfjeld)), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
283
 
284
  ## Disclaimer
285