Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,8 @@ tags:
|
|
7 |
## distilHerBERT
|
8 |
distilHerBERT-base is a BERT-based Language Model trained on Polish subset of [cc100](https://huggingface.co/datasets/cc100) dataset using Masked Language Modelling (MLM) and [distillation procedure](https://arxiv.org/abs/1910.01108) from model [HerBERT](https://huggingface.co/allegro/herbert-base-cased) with dynamic masking of whole words.
|
9 |
We provide one of the models (S4) described in the report from final project on the subject of (Deep) Natural Language Processing, which was carried out at MIMUW in 2021/2022: [Distillation_of_HerBERT](https://github.com/BartekKrzepkowski/DistilHerBERT-base_vol2/blob/master/report/Final_Report___Distillation_of_HerBERT.pdf).
|
10 |
-
|
|
|
11 |
|
12 |
Model training and experiments were conducted with transformers in version 4.20.1.
|
13 |
|
7 |
## distilHerBERT
|
8 |
distilHerBERT-base is a BERT-based Language Model trained on Polish subset of [cc100](https://huggingface.co/datasets/cc100) dataset using Masked Language Modelling (MLM) and [distillation procedure](https://arxiv.org/abs/1910.01108) from model [HerBERT](https://huggingface.co/allegro/herbert-base-cased) with dynamic masking of whole words.
|
9 |
We provide one of the models (S4) described in the report from final project on the subject of (Deep) Natural Language Processing, which was carried out at MIMUW in 2021/2022: [Distillation_of_HerBERT](https://github.com/BartekKrzepkowski/DistilHerBERT-base_vol2/blob/master/report/Final_Report___Distillation_of_HerBERT.pdf).
|
10 |
+
|
11 |
+
The model was trained using fp16 and the data parallelism method (ZeRO Stage 2), using the deep learning optimization library - DeepSpeed.
|
12 |
|
13 |
Model training and experiments were conducted with transformers in version 4.20.1.
|
14 |
|