pere commited on
Commit
2d7415b
1 Parent(s): 8eda402

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +246 -85
README.md CHANGED
@@ -1,95 +1,256 @@
1
  ---
 
2
  language:
3
  - 'no'
4
- license: apache-2.0
5
- base_model: NbAiLab/nb-whisper-large-v3-RC4
 
 
 
 
 
6
  tags:
7
  - audio
8
  - asr
9
  - automatic-speech-recognition
10
  - hf-asr-leaderboard
11
- model-index:
12
- - name: nb-whisper-large-v0.7
13
- results: []
 
 
 
 
 
 
 
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
17
- probably proofread and complete it, then remove this comment. -->
18
-
19
- # nb-whisper-large-v0.7
20
-
21
- This model is a fine-tuned version of [NbAiLab/nb-whisper-large-v3-RC4](https://huggingface.co/NbAiLab/nb-whisper-large-v3-RC4) on the NbAiLab/ncc_speech_styling_v2 dataset.
22
- It achieves the following results on the evaluation set:
23
- - step: 49999
24
- - validation_nst_loss: 0.4299
25
- - train_loss: 0.4933
26
- - validation_nst_wer: 2.1830
27
- - validation_nst_cer: 0.6702
28
- - validation_nst_exact_wer: 2.7220
29
- - validation_nst_exact_cer: 0.7519
30
- - validation_clean_stortinget_no_loss: 0.7253
31
- - validation_clean_stortinget_no_wer: 8.9886
32
- - validation_clean_stortinget_no_cer: 5.7594
33
- - validation_clean_stortinget_no_exact_wer: 11.8515
34
- - validation_clean_stortinget_no_exact_cer: 6.2132
35
-
36
- ## Model description
37
-
38
- More information needed
39
-
40
- ## Intended uses & limitations
41
-
42
- More information needed
43
-
44
- ## Training and evaluation data
45
-
46
- More information needed
47
-
48
- ## Training procedure
49
-
50
- ### Training hyperparameters
51
-
52
- The following hyperparameters were used during training:
53
- - learning_rate: 7e-05
54
- - lr_scheduler_type: linear
55
- - per_device_train_batch_size: 8
56
- - total_train_batch_size_per_node: 32
57
- - total_train_batch_size: 1024
58
- - total_optimization_steps: 50,000
59
- - starting_optimization_step: None
60
- - finishing_optimization_step: 50,000
61
- - num_train_dataset_workers: 32
62
- - num_hosts: 32
63
- - total_num_training_examples: 51,200,000
64
- - steps_per_epoch: 7798
65
- - num_beams: None
66
- - weight_decay: 0.01
67
- - adam_beta1: 0.9
68
- - adam_beta2: 0.98
69
- - adam_epsilon: 1e-06
70
- - dropout: True
71
- - bpe_dropout_probability: 0.2
72
- - activation_dropout_probability: 0.1
73
-
74
- ### Training results
75
-
76
- | step | validation_nst_loss | train_loss | validation_nst_wer | validation_nst_cer | validation_nst_exact_wer | validation_nst_exact_cer | validation_clean_stortinget_no_loss | validation_clean_stortinget_no_wer | validation_clean_stortinget_no_cer | validation_clean_stortinget_no_exact_wer | validation_clean_stortinget_no_exact_cer |
77
- |:-----:|:-------------------:|:----------:|:------------------:|:------------------:|:------------------------:|:------------------------:|:-----------------------------------:|:----------------------------------:|:----------------------------------:|:----------------------------------------:|:----------------------------------------:|
78
- | 0 | 0.4271 | 0.9562 | 2.1721 | 0.6246 | 2.7056 | 0.7070 | 0.6866 | 8.5836 | 5.4517 | 11.4126 | 5.8853 |
79
- | 5000 | 0.4400 | 0.5815 | 2.6621 | 0.7765 | 3.1629 | 0.8526 | 0.7085 | 9.1000 | 5.7626 | 12.1172 | 6.2354 |
80
- | 10000 | 0.4377 | 0.5548 | 2.2701 | 0.6740 | 2.9016 | 0.7720 | 0.6845 | 9.2823 | 5.9461 | 12.1717 | 6.4073 |
81
- | 15000 | 0.4332 | 0.5112 | 2.3246 | 0.6917 | 2.8799 | 0.7775 | 0.7101 | 9.1307 | 5.8030 | 11.9654 | 6.2408 |
82
- | 20000 | 0.4345 | 0.5066 | 2.3518 | 0.7122 | 2.8962 | 0.7940 | 0.7083 | 9.0668 | 5.8133 | 11.9867 | 6.2755 |
83
- | 25000 | 0.4315 | 0.4955 | 2.2266 | 0.6740 | 2.7873 | 0.7601 | 0.7034 | 9.0313 | 5.7971 | 11.9535 | 6.2588 |
84
- | 30000 | 0.4332 | 0.4936 | 2.2429 | 0.6936 | 2.7764 | 0.7757 | 0.7110 | 8.9957 | 5.7534 | 11.8230 | 6.1968 |
85
- | 35000 | 0.4311 | 0.4947 | 2.2102 | 0.6777 | 2.7438 | 0.7592 | 0.7138 | 9.0076 | 5.7879 | 11.8752 | 6.2463 |
86
- | 40000 | 0.4305 | 0.5026 | 2.2048 | 0.6805 | 2.7492 | 0.7638 | 0.7259 | 8.9152 | 5.6809 | 11.7827 | 6.1356 |
87
- | 45000 | 0.4309 | 0.4815 | 2.1612 | 0.6572 | 2.7111 | 0.7436 | 0.7293 | 9.0265 | 5.7800 | 11.9179 | 6.2404 |
88
- | 50000 | 0.4299 | 0.4933 | 2.1830 | 0.6702 | 2.7220 | 0.7519 | 0.7253 | 8.9886 | 5.7594 | 11.8515 | 6.2132 |
89
-
90
-
91
- ### Framework versions
92
-
93
- - Transformers 4.35.2
94
- - Datasets 2.15.0
95
- - Tokenizers 0.14.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-4.0
3
  language:
4
  - 'no'
5
+ - nb
6
+ - nn
7
+ - en
8
+ datasets:
9
+ - NbAiLab/ncc_speech
10
+ - NbAiLab/NST
11
+ - NbAiLab/NPSC
12
  tags:
13
  - audio
14
  - asr
15
  - automatic-speech-recognition
16
  - hf-asr-leaderboard
17
+ metrics:
18
+ - wer
19
+ - cer
20
+ library_name: transformers
21
+ pipeline_tag: automatic-speech-recognition
22
+ widget:
23
+ - src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/1/audio/audio.mp3
24
+ example_title: FLEURS sample 1
25
+ - src: https://datasets-server.huggingface.co/assets/google/fleurs/--/nb_no/train/4/audio/audio.mp3
26
+ example_title: FLEURS sample 2
27
  ---
28
 
29
+
30
+ # NB-Whisper Large (Release Candidate)
31
+
32
+ **IMPORTANT:** These models are currently Release Candidates. We are in the final stages of testing. If everything proceeds smoothly, we plan to officially release the models later this month.
33
+
34
+ Introducing the **_Norwegian NB-Whisper Large model_**, proudly developed by the National Library of Norway. NB-Whisper is a cutting-edge series of models designed for automatic speech recognition (ASR) and speech translation. These models are based on the work of [OpenAI's Whisper](https://arxiv.org/abs/2212.04356). Each model in the series has been trained for 250,000 steps, utilizing a diverse dataset of 8 million samples. These samples consist of aligned audio clips, each 30 seconds long, culminating in a staggering 66,000 hours of speech. For an in-depth understanding of our training methodology and dataset composition, keep an eye out for our upcoming article.
35
+
36
+ <center>
37
+ <figure>
38
+ <video controls>
39
+ <source src="https://huggingface.co/NbAiLab/nb-whisper-small-beta/resolve/main/king.mp4" type="video/mp4">
40
+ Your browser does not support the video tag.
41
+ </video>
42
+ <figcaption><a href="https://www.royalcourt.no/tale.html?tid=137662&sek=28409&scope=27248" target="_blank">Speech given by His Majesty The King of Norway at the garden party hosted by Their Majesties The King and Queen at the Palace Park on 1st of September 2016.</a>Transcribed using the Small model.</figcaption>
43
+ </figure>
44
+ </center>
45
+
46
+
47
+ ## Model Details
48
+
49
+ The NB-Whisper series offers models in five distinct sizes: Tiny, Base, Small, Medium, and Large, each designed to cater to different requirements. We generally recommend the Main models for most users, as they are balanced for common use cases. Additionally, there are two variants available for each size:
50
+
51
+ - **Verbatim version**: This lower-cased variant is more literal and suitable for tasks requiring detailed transcription, such as linguistic analysis.
52
+ - **Semantic version**: This variant focuses less on verbatim accuracy but captures the essence of content, ideal for meeting minutes and subtitling.
53
+
54
+ All models are used in the same manner. Here are the available models:
55
+
56
+ | Model Size | Parameters | Main Model | Verbatim version | Semantic version |
57
+ |------------|------------|------------|------------------|------------------|
58
+ | Tiny | 39M | [NB-Whisper Tiny](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny) | [Tiny - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-verbatim) | [Tiny - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-tiny-semantic) |
59
+ | Base | 74M | [NB-Whisper Base](https://huggingface.co/NbAiLabBeta/nb-whisper-base) | [Base - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-base-verbatim) | [Base - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-base-semantic) |
60
+ | Small | 244M | [NB-Whisper Small](https://huggingface.co/NbAiLabBeta/nb-whisper-small) | [Small - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-small-verbatim) | [Small - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-small-semantic) |
61
+ | Medium | 769M | [NB-Whisper Medium](https://huggingface.co/NbAiLabBeta/nb-whisper-medium) | [Medium - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-verbatim) | [Medium - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-medium-semantic) |
62
+ | Large | 1550M | [NB-Whisper Large](https://huggingface.co/NbAiLabBeta/nb-whisper-large) | [Large - verbatim](https://huggingface.co/NbAiLabBeta/nb-whisper-large-verbatim) | [Large - semantic](https://huggingface.co/NbAiLabBeta/nb-whisper-large-semantic) |
63
+
64
+
65
+ Please refer to the OpenAI Whisper model card for more details about the backbone model.
66
+
67
+ ### Model Description
68
+
69
+ - **Developed by:** [NB AI-Lab](https://ai.nb.no/)
70
+ - **Shared by:** [NB AI-Lab](https://ai.nb.no/)
71
+ - **Model type:** `whisper`
72
+ - **Language(s) (NLP):** Norwegian, Norwegian Bokmål, Norwegian Nynorsk, English
73
+ - **License:** [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
74
+ - **Trained from model:** [openai/whisper-large](https://huggingface.co/openai/whisper-large)
75
+ - **Code Repository:** https://github.com/NbAiLab/nb-whisper/
76
+ - **Paper:** _Coming soon_
77
+ - **Demo:** _See Spaces on this page_
78
+
79
+ ## How to Use the Models
80
+
81
+ ### Online Demos
82
+ You can try the models directly through the HuggingFace Inference API, accessible on the right side of this page. Be aware that initially, the model needs to load and will run on limited CPU capacity, which might be slow. To enhance your experience, we are temporarily hosting some models on TPUs for a few days, significantly boosting their performance. Explore these under the **Spaces** section on the [Main Page](https://huggingface.co/NbAiLabBeta/).
83
+
84
+ ### Local Setup with HuggingFace
85
+ Alternatively, you can download the models for local usage. The Tiny, Base, and Small models are optimized for CPU execution. For the Medium and Large models, we recommend a system equipped with a GPU to ensure efficient processing. Setting up and using these models with HuggingFace's Transformers is straightforward, provided you have [Python](https://www.python.org/downloads/) installed on your machine. For practical demonstrations, refer to examples using this [sample mp3 file](https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3).
86
+
87
+ ```bash
88
+ # Download the sample file
89
+ > wget -N https://github.com/NbAiLab/nb-whisper/raw/main/audio/king.mp3
90
+
91
+ # Install necessary libraries.
92
+ > pip install transformers>=4.35.2
93
+ ```
94
+
95
+ After this is done, you should be able to run this in Python:
96
+
97
+ ```python
98
+ from transformers import pipeline
99
+
100
+ # Load the model
101
+ asr = pipeline("automatic-speech-recognition", "NbAiLabBeta/nb-whisper-large")
102
+
103
+ #transcribe
104
+ asr("king.mp3", generate_kwargs={'task': 'transcribe', 'language': 'no'})
105
+
106
+ ```
107
+
108
+ <details>
109
+ <summary>Expected output</summary>
110
+
111
+ ```json
112
+ {
113
+ {'text': ' Nordmenn er nordlendinger, trøndere, sørlendinger og folk fra alle andre regioner. Nordmenn er også innvandret fra Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria. Det er ikke alltid så lett å si hvor vi er fra, hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra.'}
114
+ }
115
+ ```
116
+ </details>
117
+
118
+ Examining the output, we see that there are multiple repetitions in the end. This is because the default length is 30 seconds and the video is 1:25 minutes. By passing the ```chunk_lengt_s``` argument, we can transcribe longer file.
119
+
120
+ ```python
121
+ asr("king.mp3", chunk_length_s=30, generate_kwargs={'task': 'transcribe', 'language': 'no'})
122
+ ```
123
+ <details>
124
+ <summary>Expected output</summary>
125
+
126
+ ```json
127
+ {
128
+ {'text': ' Nordmenn er nordlendinger, trøndere, sørlendinger og folk fra alle andre regioner. Nordmenn er også innvandret fra Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria. Det er ikke alltid så lett å si hvor vi er fra, hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra, hvilken nasjonalitet vi tilhører. Det vi kaller hjem, er der hjertet vårt er, og det kan ikke alltid plasseres innenfor landegrenser. Nordmenn er jenter som er glad i jenter, gutter som er glad i gutter, og jenter og gutter som er glad i hverandre. Nordmenn trommer på Gud, Allah, Altet og ingenting. Nordmenn liker Grieg, Kygo, Helbilis og Kari Bremnes. Med andre ord, Norge er dere. Norge er oss. Mitt største håp for Norge er at vi skal klare å ta vare på hverandre, at vi skal bygge dette landet videre på tillit, fellesskap og raushet.'}
129
+ ```
130
+ </details>
131
+
132
+ Here the output looks a lot better. We can also ask the model to output timestamps:
133
+ ```python
134
+ asr("king.mp3", chunk_length_s=30, return_timestamps=True, generate_kwargs={'task': 'transcribe', 'language': 'no'})
135
+ ```
136
+ <details>
137
+ <summary>Expected output</summary>
138
+
139
+ ```json
140
+ {
141
+ {'text': ' Nordmenn er nordlendinger, trøndere, sørlendinger og folk fra alle andre regioner. Nordmenn er også innvandret fra Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria. Det er ikke alltid så lett å si hvor vi er fra, hvilken nasjonalitet vi er fra. Hvilken nasjonalitet vi er fra. hvilken nasjonalitet vi tilhører. Det vi kaller hjem, er der hjertet vårt er, og det kan ikke alltid plasseres innenfor landegrenser. Nordmenn er jenter som er glad i jenter, gutter som er glad i gutter, og jenter og gutter som er glad i hverandre. Nordmenn trommer på Gud, Allah, Altet og ingenting. Nordmenn liker Grieg, Kygo, Helbiles og Kari Bremnes. Med andre ord, Norge er dere. Norge er oss. Mitt største håp for Norge er at vi skal klare å ta vare på hverandre, at vi skal bygge dette landet videre på tillit, fellesskap og raushet.',
142
+ 'chunks': [{'timestamp': (0.0, 5.46),
143
+ 'text': ' Nordmenn er nordlendinger, trøndere, sørlendinger'},
144
+ {'timestamp': (5.52, 8.68), 'text': ' og folk fra alle andre regioner.'},
145
+ {'timestamp': (8.68, 16.64),
146
+ 'text': ' Nordmenn er også innvandret fra Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria.'},
147
+ {'timestamp': (16.64, 13.3),
148
+ 'text': ' Det er ikke alltid så lett å si hvor vi er fra, hvilken nasjonalitet vi er fra.'},
149
+ {'timestamp': (13.32, 30.28),
150
+ 'text': ' Hvilken nasjonalitet vi er fra. hvilken nasjonalitet vi tilhører.'},
151
+ {'timestamp': (32.52, 39.16),
152
+ 'text': ' Det vi kaller hjem, er der hjertet vårt er, og det kan ikke alltid plasseres'},
153
+ {'timestamp': (39.16, 42.0), 'text': ' innenfor landegrenser.'},
154
+ {'timestamp': (42.0, 46.74),
155
+ 'text': ' Nordmenn er jenter som er glad i jenter, gutter som er glad i gutter,'},
156
+ {'timestamp': (46.74, 51.12),
157
+ 'text': ' og jenter og gutter som er glad i hverandre.'},
158
+ {'timestamp': (51.16, 57.42),
159
+ 'text': ' Nordmenn trommer på Gud, Allah, Altet og ingenting.'},
160
+ {'timestamp': (57.42, 64.3),
161
+ 'text': ' Nordmenn liker Grieg, Kygo, Helbiles og Kari Bremnes.'},
162
+ {'timestamp': (64.34, 71.24),
163
+ 'text': ' Med andre ord, Norge er dere. Norge er oss.'},
164
+ {'timestamp': (71.24, 78.04),
165
+ 'text': ' Mitt største håp for Norge er at vi skal klare å ta vare på hverandre,'},
166
+ {'timestamp': (78.12, 84.68),
167
+ 'text': ' at vi skal bygge dette landet videre på tillit, fellesskap og raushet.'}]}
168
+ ```
169
+ </details>
170
+
171
+ Some other cool features to look into:
172
+ ```python
173
+ # Transcribe to Nynorsk
174
+ asr("king.mp3", chunk_length_s=30, generate_kwargs={'task': 'transcribe', 'language': 'nn'})
175
+ ```
176
+ <details>
177
+ <summary>Expected output</summary>
178
+
179
+ ```json
180
+ {
181
+ "text": "Nordmenn er nordlendingar, trøndarar, sørlendingar og folk frå alle andre regionar. Nordmenn er også innvandra frå Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria. Det er ikkje alltid så lett å seie kvar vi er frå, kva nasjonalitet vi tilhøyrer. Det vi kallar heim, er der hjartet vårt er, og det kan ikkje alltid plasserast innanfor landegrenser. Nordmenn er jenter som er glad i jenter, gutar som erade i gutar, og jenter og gutar som er glade i kvarandre. Nordmenn trommar på Gud, Allah, Altet og ingenting. Nordmenn liker Grieg, Kygo, Helbiles og Kari Bremnes. Med andre ord, Noreg er dere! Noreg er oss. Mitt største håp for Noreg er at vi skal klare å ta vare på kvarandre, at vi skal byggje dette landet vidare på tillit, fellesskap og raushet."
182
+ }
183
+ ```
184
+ </details>
185
+
186
+ ```python
187
+ # Transcribe to English
188
+ asr("king.mp3", chunk_length_s=30, generate_kwargs={'task': 'transcribe', 'language': 'en'})
189
+ ```
190
+ <details>
191
+ <summary>Expected output</summary>
192
+
193
+ ```json
194
+ {
195
+ "text": "Norwegians are Norwegians, trønders, southerners and people from all other regions. Norwegians are also invaded from Afghanistan, Pakistan, Poland, Sweden, Somalia and Suria. It is not always so easy to say where we are from, what nationality we belong to. What we call home is where our heart is, and it cannot always be placed within national borders. Norwegians are girls who like girls, boys who like boys, and girls and boys who like each other. Norwegians thrump on God, Allah, Altet and nothing. Norwegians like Grieg, Kygo, Helbilis and Kari Bremnes. In other words, Norway is you. Norway is us. My biggest hope for Norway is that we should be able to take care of each other, that we should build this country on trust, community and generosity."
196
+ }
197
+ ```
198
+ </details>
199
+
200
+ ```python
201
+ # Return Word Level Timestamps
202
+ asr("king.mp3", chunk_length_s=30, return_timestamps="word", generate_kwargs={'task': 'transcribe', 'language': 'no'})
203
+ ```
204
+
205
+ <details>
206
+ <summary>Expected output</summary>
207
+
208
+ ```json
209
+ {
210
+ "text": "Nordmenn er nordlendinger, trøndere, sørlendinger og folk fra alle andre regioner. Nordmenn er også innvandret fra Afghanistan, Pakistan, Polen, Sverige, Somalia og Syria. Det er ikke alltid så lett å si hvor vi er fra, hvilken nasjonalitet vi tilhører. Det vi kaller hjem, er der hjertet vårt er, og det kan ikke alltid plasseres innenfor landegrenser. Nordmenn er jenter som er glad i jenter, gutter som er glad i gutter, og jenter og gutter som er glad i hverandre. Nordmenn trommer på Gud, Allah, Altet og ingenting. Nordmenn liker Grieg, Kygo, Helbilis og Kari Bremnes. Med andre ord, Norge er dere. Norge er oss. Mitt største håp for Norge er at vi skal klare å ta vare på hverandre, at vi skal bygge dette landet videre på tillit, fellesskap og raushet.",
211
+ "chunks": [
212
+ {"text": "Nordmenn", "timestamp": [0.72, 1.42]},
213
+ {"text": "er", "timestamp": [1.42, 1.74]},
214
+ // ... more chunks ...
215
+ {"text": "raushet.", "timestamp": [83.1, 84.88]}
216
+ ]
217
+ }
218
+ ```
219
+ </details>
220
+
221
+ ### Whisper CPP
222
+ Whisper CPP is a C++ implementation of the Whisper model, offering the same functionalities with the added benefits of C++ efficiency and performance optimizations. This allows embedding any Whisper model into a binary file, facilitating the development of real applications. However, it requires some familiarity with compiling C++ programs. Their [homepage](https://github.com/ggerganov/whisper.cpp) provides examples of how to build applications, including real-time transcription.
223
+
224
+ We have converted this model to the ggml-format model used by Whisper CPP binaries. The file can be downloaded [here](blob/main/ggml-model.bin).
225
+
226
+ ### API
227
+ Instructions for accessing the models via a simple API are included in the demos under Spaces. Note that these demos are temporary and will only be available for a few weeks.
228
+
229
+ ## Training Data
230
+ The training data originates from Språkbanken and the National Library of Norway's digital collection, including:
231
+
232
+ - NST Norwegian ASR Database (16 kHz) and its corresponding dataset
233
+ - Transcribed speeches from the Norwegian Parliament by Språkbanken
234
+ - TV broadcast (NRK) subtitles (NLN digital collection)
235
+ - Audiobooks (NLN digital collection)
236
+
237
+ ## Downstream Use
238
+
239
+ The models, especially the smaller ones, may exhibit occasional hallucinations and may drop parts of the transcript. They are designed to convert spoken language into grammatically correct written sentences, which might not always be word-for-word translations. We have made two extra model variant for users that want a different transcription style. We encourage users to try the models themselves to get a better understanding.
240
+
241
+ ## Bias, Risks, and Limitations
242
+
243
+ Using these models without adequate risk assessment and mitigation could be considered irresponsible. They may contain biases or other undesirable distortions. Users who deploy these models or integrate them into systems or services are responsible for mitigating risks and complying with applicable AI regulations. The National Library of Norway, as the model owner, disclaims liability for any outcomes resulting from third-party use of these models.
244
+
245
+ ### Software
246
+ The model is trained using Jax/Flax and converted to Pytorch, Tensorflow, whisper.cpp, and ONXX formats. These are available under `Files and versions`. We welcome requests for conversion to other formats.
247
+
248
+ ## Citation & Contributors
249
+ The NB-Whisper Large model is a product of the NoSTram project led by Per Egil Kummervold (PEK) at the National Library of Norway. Key contributors include Javier de la Rosa (JdlR), Freddy Wetjen (FW), Rolv-Arild Braaten (RAB), and PEK. AiLab, under the direction of Svein Arne Brygfjeld, supported the project's successful completion. A detailed paper on our process and findings is forthcoming.
250
+
251
+ ## Acknowledgements
252
+
253
+ Our gratitude extends to [Google TPU Research Cloud](https://sites.research.google/trc/about/) for training resources, Google Cloud for translation credits, and HuggingFace's Sanchit Ghandi for technical support. A special thank you to Per Erik Solberg at Språkbanken for the collaboration on the Stortinget corpus.
254
+
255
+ ## Contact
256
+ For feedback, technical concerns, or collaboration inquiries, please contact <a rel="noopener nofollow" href="mailto:ailab@nb.no">ailab@nb.no</a>. If you plan to include this model in your research, contact us for the latest information on our upcoming paper for citation purposes.