Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ The training for the distilled model (student model) is designed to be the close
|
|
20 |
* MLMLoss: a Masked Language Modeling (MLM) task loss to perform the student model with the original task of the teacher model ;
|
21 |
* CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
|
22 |
|
23 |
-
The final loss function is a combination of these three
|
24 |
|
25 |
*Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
|
26 |
|
|
|
20 |
* MLMLoss: a Masked Language Modeling (MLM) task loss to perform the student model with the original task of the teacher model ;
|
21 |
* CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
|
22 |
|
23 |
+
The final loss function is a combination of these three losses functions. We use the following ponderation:
|
24 |
|
25 |
*Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
|
26 |
|