Addedk commited on
Commit
1549817
1 Parent(s): def578e

Add training data and summary of evaluation results.

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # mBERT swedish distileld base model (cased)
6
 
7
  This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
8
 
@@ -14,3 +14,18 @@ This is a 6-layer version of mBERT, having been distilled using the [LightMBERT]
14
  ## Intended uses & limitations
15
  You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
16
  be fine-tuned on a downstream task.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # mBERT swedish distilled base model (cased)
6
 
7
  This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
8
 
 
14
  ## Intended uses & limitations
15
  You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
16
  be fine-tuned on a downstream task.
17
+
18
+
19
+ ## Training data
20
+
21
+ The data used for distillation was the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword).
22
+ The tokenized data had a file size of approximately 9 GB.
23
+
24
+ ## Evaluation results
25
+
26
+ When evaluated on the [SUCX 3.0 ](https://huggingface.co/datasets/KBLab/sucx3_ner) dataset, it achieved an average F1 score of 0.859 which is competitive with the score mBERT obtained, 0.866.
27
+
28
+ When evaluated on the [English WikiANN](https://huggingface.co/datasets/wikiann) dataset, it achieved an average F1 score of 0.826 which is competitive with the score mBERT obtained, 0.849.
29
+
30
+ Additional results and comparisons are presented in my Master's Thesis
31
+