Cyrile commited on
Commit
91e5b65
1 Parent(s): 8ae8db4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -21,7 +21,6 @@ The training for the distilled model (student model) is designed to be the close
21
  * CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
22
 
23
  The final loss function is a combination of these three loss functions. We use the following ponderation:
24
-
25
  *Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
26
 
27
  Dataset
 
21
  * CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
22
 
23
  The final loss function is a combination of these three loss functions. We use the following ponderation:
 
24
  *Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
25
 
26
  Dataset