Addedk commited on
Commit
def578e
1 Parent(s): e56ebdd

Add model description, links to lightmbert, name of master' thesis, and intended uses & limitations

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -4,4 +4,13 @@ license: apache-2.0
4
 
5
  # mBERT swedish distileld base model (cased)
6
 
7
- This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py).
 
 
 
 
 
 
 
 
 
 
4
 
5
  # mBERT swedish distileld base model (cased)
6
 
7
+ This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
8
+
9
+
10
+ ## Model description
11
+ This is a 6-layer version of mBERT, having been distilled using the [LightMBERT](https://arxiv.org/abs/2103.06418) distillation method, but without freezing the embedding layer.
12
+
13
+
14
+ ## Intended uses & limitations
15
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
16
+ be fine-tuned on a downstream task.