pere commited on
Commit
8a7a8c7
1 Parent(s): 65e2485

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -130,6 +130,15 @@ asr(
130
  # 'text': ' Først så kan vi ta og henge dem kjemme, og så får vi gjøre vårt valget når vi kommer dit.'}]}
131
  ```
132
 
 
 
 
 
 
 
 
 
 
133
  ## Environmental Impact
134
 
135
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
@@ -145,7 +154,7 @@ Carbon emissions estimated using the [Machine Learning Impact calculator](https:
145
 
146
  #### Software
147
 
148
- The model is trained using Jax/Flax. The final model is converted to Pytorch, whisper.cpp and ONXX. Please tell us if you would like future models to be converted to other format.
149
 
150
  ## Citation & Authors
151
  This model was developed within the scope of the _NoSTram_ project, led by _Per Egil Kummervold_. The Jax code and training scripts were crafted by _Javier de la Rosa_, _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Dataset curation was carried out by _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Documentation was composed by _Javier de la Rosa_ and _Per Egil Kummervold_. The AiLab is under the direction of _Svein Arne Brygfjeld_. Each author contributed to the development and deliberations on the optimal way to train a Norwegian ASR model using Whisper. The work on this model was conducted as part of our professional roles at the National Library of Norway.
 
130
  # 'text': ' Først så kan vi ta og henge dem kjemme, og så får vi gjøre vårt valget når vi kommer dit.'}]}
131
  ```
132
 
133
+ ## Training Data
134
+ Trained data comes from Språkbanken and the digital collection at the National Library of Norway. Training data includes:
135
+
136
+ - NST Norwegian ASR Database (16 kHz), and its corresponding dataset
137
+ - Transcribed speeches from the Norwegian Parliament produced by Språkbanken
138
+ - TV broadcast (NRK) subtitles (NLN digital collection)
139
+ - Audiobooks (NLN digital collection)
140
+
141
+
142
  ## Environmental Impact
143
 
144
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
154
 
155
  #### Software
156
 
157
+ The model is trained using Jax/Flax. The final model is converted to Pytorch, Tensorflow, whisper.cpp and ONXX. Please tell us if you would like future models to be converted to other format.
158
 
159
  ## Citation & Authors
160
  This model was developed within the scope of the _NoSTram_ project, led by _Per Egil Kummervold_. The Jax code and training scripts were crafted by _Javier de la Rosa_, _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Dataset curation was carried out by _Freddy Wetjen_, _Rolv-Arild Braaten_, and _Per Egil Kummervold_. Documentation was composed by _Javier de la Rosa_ and _Per Egil Kummervold_. The AiLab is under the direction of _Svein Arne Brygfjeld_. Each author contributed to the development and deliberations on the optimal way to train a Norwegian ASR model using Whisper. The work on this model was conducted as part of our professional roles at the National Library of Norway.