File size: 1,692 Bytes
8e90e4a
 
 
e56ebdd
1549817
e56ebdd
def578e
 
 
 
 
 
 
 
 
 
1549817
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
license: apache-2.0
---

# mBERT swedish distilled base model (cased)

This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.


## Model description
This is a 6-layer version of mBERT, having been distilled using the [LightMBERT](https://arxiv.org/abs/2103.06418) distillation method, but without freezing the embedding layer.


## Intended uses & limitations
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
be fine-tuned on a downstream task. 


## Training data

The data used for distillation was the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword).
The tokenized data had a file size of approximately 9 GB.

## Evaluation results

When evaluated on the [SUCX 3.0 ](https://huggingface.co/datasets/KBLab/sucx3_ner) dataset, it achieved an average F1 score of 0.859 which is competitive with the score mBERT obtained, 0.866.

When evaluated on the [English WikiANN](https://huggingface.co/datasets/wikiann) dataset, it achieved an average F1 score of 0.826 which is competitive with the score mBERT obtained, 0.849. 

Additional results and comparisons are presented in my Master's Thesis