MichelBartelsDeepset commited on
Commit
8084f28
1 Parent(s): 0eb6f15

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - squad_v2
5
+ license: mit
6
+ thumbnail: https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg
7
+ tags:
8
+ - exbert
9
+ ---
10
+
11
+ ## Overview
12
+ **Language model:** deepset/tinybert-6L-768D-squad2
13
+ **Language:** English
14
+ **Training data:** SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set
15
+ **Eval data:** SQuAD 2.0 dev set
16
+ **Infrastructure**: 1x V100 GPU
17
+ **Published**: Dec 8th, 2021
18
+
19
+ ## Details
20
+ - haystack's intermediate layer and prediction layer distillation features were used for training (based on [TinyBERT](https://arxiv.org/pdf/1909.10351.pdf)). deepset/bert-base-uncased-squad2 was used as the teacher model.
21
+
22
+ ## Hyperparameters
23
+ ### Intermediate layer distillation
24
+ ```
25
+ batch_size = 26
26
+ n_epochs = 5
27
+ max_seq_len = 384
28
+ learning_rate = 5e-5
29
+ lr_schedule = LinearWarmup
30
+ embeds_dropout_prob = 0.1
31
+ temperature = 1
32
+ distillation_loss_weight = 0.75
33
+ ```
34
+ ### Prediction layer distillation
35
+ ```
36
+ batch_size = 26
37
+ n_epochs = 5
38
+ max_seq_len = 384
39
+ learning_rate = 3e-5
40
+ lr_schedule = LinearWarmup
41
+ embeds_dropout_prob = 0.1
42
+ temperature = 1
43
+ distillation_loss_weight = 0.75
44
+ ```
45
+ ## Performance
46
+ ```
47
+ "exact": 71.87736882001179
48
+ "f1": 76.36111895973675
49
+ ```
50
+
51
+ ## Authors
52
+ - Timo Möller: `timo.moeller [at] deepset.ai`
53
+ - Julian Risch: `julian.risch [at] deepset.ai`
54
+ - Malte Pietsch: `malte.pietsch [at] deepset.ai`
55
+ - Michel Bartels: `michel.bartels [at] deepset.ai`
56
+ ## About us
57
+ ![deepset logo](https://workablehr.s3.amazonaws.com/uploads/account/logo/476306/logo)
58
+ We bring NLP to the industry via open source!
59
+ Our focus: Industry specific language models & large scale QA systems.
60
+
61
+ Some of our work:
62
+ - [German BERT (aka "bert-base-german-cased")](https://deepset.ai/german-bert)
63
+ - [GermanQuAD and GermanDPR datasets and models (aka "gelectra-base-germanquad", "gbert-base-germandpr")](https://deepset.ai/germanquad)
64
+ - [FARM](https://github.com/deepset-ai/FARM)
65
+ - [Haystack](https://github.com/deepset-ai/haystack/)
66
+
67
+ Get in touch:
68
+ [Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)
69
+
70
+ By the way: [we're hiring!](http://www.deepset.ai/jobs)