pere commited on
Commit
6b9a19c
1 Parent(s): 317ccb3

updated template

Browse files
Files changed (1) hide show
  1. README.md +51 -16
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: cc-by-4.0
3
  language:
4
  - 'no'
5
  - nb
@@ -9,6 +9,7 @@ datasets:
9
  - NbAiLab/ncc_speech
10
  - NbAiLab/NST
11
  - NbAiLab/NPSC
 
12
  tags:
13
  - audio
14
  - asr
@@ -39,27 +40,38 @@ Introducing the **_Norwegian NB-Whisper Large model_**, proudly developed by the
39
  <source src="https://huggingface.co/NbAiLab/nb-whisper-small-beta/resolve/main/king.mp4" type="video/mp4">
40
  Your browser does not support the video tag.
41
  </video>
42
- <figcaption><a href="https://www.royalcourt.no/tale.html?tid=137662&sek=28409&scope=27248" target="_blank">Speech given by His Majesty The King of Norway at the garden party hosted by Their Majesties The King and Queen at the Palace Park on 1st of September 2016.</a>Transcribed using the Small model.</figcaption>
43
  </figure>
44
  </center>
45
 
46
 
47
  ## Model Details
48
 
49
- The NB-Whisper series offers models in five distinct sizes: Tiny, Base, Small, Medium, and Large, each designed to cater to different requirements. We generally recommend the Main models for most users, as they are balanced for common use cases. Additionally, there are two variants available for each size:
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  - **Verbatim version**: This lower-cased variant is more literal and suitable for tasks requiring detailed transcription, such as linguistic analysis.
52
  - **Semantic version**: This variant focuses less on verbatim accuracy but captures the essence of content, ideal for meeting minutes and subtitling.
53
 
54
  All models are used in the same manner. Here are the available models:
55
 
56
- | Model Size | Parameters | Main Model | Verbatim version | Semantic version |
57
- |------------|------------|------------|------------------|------------------|
58
- | Tiny | 39M | [NB-Whisper Tiny](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny) | [Tiny - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-verbatim) | [Tiny - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-semantic) |
59
- | Base | 74M | [NB-Whisper Base](https://huggingface.co/NbAiLabBeta/nb-whisper-base) | [Base - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-base-verbatim) | [Base - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-base-semantic) |
60
- | Small | 244M | [NB-Whisper Small](https://huggingface.co/NbAiLabBeta/nb-whisper-small) | [Small - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-small-verbatim) | [Small - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-small-semantic) |
61
- | Medium | 769M | [NB-Whisper Medium](https://huggingface.co/NbAiLabBeta/nb-whisper-medium) | [Medium - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-verbatim) | [Medium - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-semantic) |
62
- | Large | 1550M | [NB-Whisper Large](https://huggingface.co/NbAiLabBeta/nb-whisper-large) | [Large - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-large-verbatim) | [Large - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-large-semantic) |
63
 
64
 
65
  Please refer to the OpenAI Whisper model card for more details about the backbone model.
@@ -70,7 +82,7 @@ Please refer to the OpenAI Whisper model card for more details about the backbon
70
  - **Shared by:** [NB AI-Lab](https://ai.nb.no/)
71
  - **Model type:** `whisper`
72
  - **Language(s) (NLP):** Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
73
- - **License:** [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
74
  - **Trained from model:** [openai/whisper-large](https://huggingface.co/openai/whisper-large)
75
  - **Code Repository:** https://github.com/NbAiLab/nb-whisper/
76
  - **Paper:** _Coming soon_
@@ -86,10 +98,10 @@ Alternatively, you can download the models for local usage. The Tiny, Base, and
86
 
87
  ```bash
88
  # Download the sample file
89
- > wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
90
 
91
  # Install necessary libraries.
92
- > pip install transformers>=4.35.2
93
  ```
94
 
95
  After this is done, you should be able to run this in Python:
@@ -169,10 +181,12 @@ asr("king.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs={'tas
169
  </details>
170
 
171
  Some other cool features to look into:
 
172
  ```python
173
  # Transcribe to Nynorsk
174
  asr("king.mp3", chunk_length_s=30, generate_kwargs={'task': 'transcribe', 'language': 'nn'})
175
  ```
 
176
  <details>
177
  <summary>Expected output</summary>
178
 
@@ -221,7 +235,24 @@ asr("king.mp3", chunk_length_s=30, return_timestamps="word", generate_kwargs={'t
221
  ### Whisper CPP
222
  Whisper CPP is a C++ implementation of the Whisper model, offering the same functionalities with the added benefits of C++ efficiency and performance optimizations. This allows embedding any Whisper model into a binary file, facilitating the development of real applications. However, it requires some familiarity with compiling C++ programs. Their [homepage](https://github.com/ggerganov/whisper.cpp) provides examples of how to build applications, including real-time transcription.
223
 
224
- We have converted this model to the ggml-format model used by Whisper CPP binaries. The file can be downloaded [here](blob/main/ggml-model.bin).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
225
 
226
  ### API
227
  Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.
@@ -243,10 +274,14 @@ The models, especially the smaller ones, may exhibit occasional hallucinations a
243
  Using these models without adequate risk assessment and mitigation could be considered irresponsible. They may contain biases or other undesirable distortions. Users who deploy these models or integrate them into systems or services are responsible for mitigating risks and complying with applicable AI regulations. The National Library of Norway, as the model owner, disclaims liability for any outcomes resulting from third-party use of these models.
244
 
245
  ### Software
246
- The model is trained using Jax/Flax and converted to Pytorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats.
247
 
248
  ## Citation & Contributors
249
- The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold (PEK) at the National Library of Norway. Key contributors include Javier de la Rosa (JdlR), Freddy Wetjen (FW), Rolv-Arild Braaten (RAB), and PEK. AiLab, under the direction of Svein Arne Brygfjeld, supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
 
 
 
 
250
 
251
  ## Acknowledgements
252
 
 
1
  ---
2
+ license: apache-2.0
3
  language:
4
  - 'no'
5
  - nb
 
9
  - NbAiLab/ncc_speech
10
  - NbAiLab/NST
11
  - NbAiLab/NPSC
12
+ base_model: openai/whisper-large
13
  tags:
14
  - audio
15
  - asr
 
40
  <source src="https://huggingface.co/NbAiLab/nb-whisper-small-beta/resolve/main/king.mp4" type="video/mp4">
41
  Your browser does not support the video tag.
42
  </video>
43
+ <figcaption><a href="https://www.royalcourt.no/tale.html?tid=137662&sek=28409&scope=27248" target="_blank">Speech given by His Majesty The King of Norway at the garden party hosted by Their Majesties The King and Queen at the Palace Park on 1st of September 2016.</a> Transcribed using the Small model.</figcaption>
44
  </figure>
45
  </center>
46
 
47
 
48
  ## Model Details
49
 
50
+ The NB-Whisper series offers models in five distinct sizes: Tiny, Base, Small, Medium, and Large, each designed to cater to different requirements. These models are balanced for common use cases.
51
+
52
+
53
+ | Model Size | Parameters | Model |
54
+ |------------|------------|------------|
55
+ | Tiny | 39M | [NB-Whisper Tiny](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny) |
56
+ | Base | 74M | [NB-Whisper Base](https://huggingface.co/NbAiLabBeta/nb-whisper-base) |
57
+ | Small | 244M | [NB-Whisper Small](https://huggingface.co/NbAiLabBeta/nb-whisper-small) |
58
+ | Medium | 769M | [NB-Whisper Medium](https://huggingface.co/NbAiLabBeta/nb-whisper-medium) |
59
+ | Large | 1550M | [NB-Whisper Large](https://huggingface.co/NbAiLabBeta/nb-whisper-large) |
60
+
61
+ Additionally, there are two variants available for each size:
62
 
63
  - **Verbatim version**: This lower-cased variant is more literal and suitable for tasks requiring detailed transcription, such as linguistic analysis.
64
  - **Semantic version**: This variant focuses less on verbatim accuracy but captures the essence of content, ideal for meeting minutes and subtitling.
65
 
66
  All models are used in the same manner. Here are the available models:
67
 
68
+ | Model Size | Parameters | Verbatim version | Semantic version |
69
+ |------------|------------|------------|------------------|
70
+ | Tiny | 39M | [Tiny - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-verbatim) | [Tiny - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-semantic) |
71
+ | Base | 74M | [Base - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-base-verbatim) | [Base - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-base-semantic) |
72
+ | Small | 244M | [Small - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-small-verbatim) | [Small - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-small-semantic) |
73
+ | Medium | 769M | [Medium - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-verbatim) | [Medium - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-semantic) |
74
+ | Large | 1550M | [Large - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-large-verbatim) | [Large - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-large-semantic) |
75
 
76
 
77
  Please refer to the OpenAI Whisper model card for more details about the backbone model.
 
82
  - **Shared by:** [NB AI-Lab](https://ai.nb.no/)
83
  - **Model type:** `whisper`
84
  - **Language(s) (NLP):** Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
85
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
86
  - **Trained from model:** [openai/whisper-large](https://huggingface.co/openai/whisper-large)
87
  - **Code Repository:** https://github.com/NbAiLab/nb-whisper/
88
  - **Paper:** _Coming soon_
 
98
 
99
  ```bash
100
  # Download the sample file
101
+ $ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
102
 
103
  # Install necessary libraries.
104
+ $ pip install transformers>=4.35.2
105
  ```
106
 
107
  After this is done, you should be able to run this in Python:
 
181
  </details>
182
 
183
  Some other cool features to look into:
184
+
185
  ```python
186
  # Transcribe to Nynorsk
187
  asr("king.mp3", chunk_length_s=30, generate_kwargs={'task': 'transcribe', 'language': 'nn'})
188
  ```
189
+
190
  <details>
191
  <summary>Expected output</summary>
192
 
 
235
  ### Whisper CPP
236
  Whisper CPP is a C++ implementation of the Whisper model, offering the same functionalities with the added benefits of C++ efficiency and performance optimizations. This allows embedding any Whisper model into a binary file, facilitating the development of real applications. However, it requires some familiarity with compiling C++ programs. Their [homepage](https://github.com/ggerganov/whisper.cpp) provides examples of how to build applications, including real-time transcription.
237
 
238
+ We have converted this model to the ggml-format model used by Whisper CPP binaries. The file can be downloaded [here](blob/main/ggml-model.bin), and a `q5_0` quantized version is also available [here](blob/main/ggml-model-q5_0.bin).
239
+
240
+ ```bash
241
+ # We can download and compile whisper.cpp
242
+ $ git clone --depth 1 https://github.com/ggerganov/whisper.cpp --branch v1.5.1
243
+ $ cd whisper.cpp/
244
+ $ make
245
+
246
+ # We also need to convert the audio to WAV as that is the only format supported by whisper.cpp
247
+ $ wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
248
+ $ ffmpeg -i king.mp3 -ar 16000 -ac 1 -c:a pcm_s16le king.wav
249
+
250
+ # And run it with the f16 default model
251
+ $ ./main -m /path/to/ggml-model.bin king.wav
252
+
253
+ # Or the quantized version
254
+ $ ./main -m /path/to/ggml-model-q5_0.bin king.wav
255
+ ```
256
 
257
  ### API
258
  Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.
 
274
  Using these models without adequate risk assessment and mitigation could be considered irresponsible. They may contain biases or other undesirable distortions. Users who deploy these models or integrate them into systems or services are responsible for mitigating risks and complying with applicable AI regulations. The National Library of Norway, as the model owner, disclaims liability for any outcomes resulting from third-party use of these models.
275
 
276
  ### Software
277
+ The model was trained using Jax/Flax and converted to PyTorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats. All training code and scripts are released under the Apache License 2.0 in the GitHub repository [nb-whisper](https://github.com/NbAiLab/nb-whisper/).
278
 
279
  ## Citation & Contributors
280
+ The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold ([@pere](https://huggingface.co/pere) at the National Library of Norway. Key contributors include Javier de la Rosa ([@versae](https://huggingface.co/versae), Freddy Wetjen ([@](https://huggingface.co/freddyw), and Rolv-Arild Braaten ([@Rolv-Arild](https://huggingface.co/Rolv-Arild). NB AI-Lab, under the direction of Svein Arne Brygfjeld ([@Brygfjeld](https://huggingface.co/Brygfjeld), supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
281
+
282
+ ## Disclaimer
283
+
284
+ The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (The National Library of Norway) be liable for any results arising from the use made by third parties of these models.
285
 
286
  ## Acknowledgements
287