cmarkea
/

distilcamembert-base

Inference Endpoints

Model card Files Files and versions Community

Cyrile commited on Dec 16, 2021

Commit

91e5b65

•

1 Parent(s): 8ae8db4

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -21,7 +21,6 @@ The training for the distilled model (student model) is designed to be the close
 * CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
 The final loss function is a combination of these three loss functions. We use the following ponderation:
 *Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
 Dataset

 * CosineLoss: and finally a cosine embedding loss. This loss function is applied on the last hidden layers of student and teacher models to guarantee a collinearity between them.
 The final loss function is a combination of these three loss functions. We use the following ponderation:
 *Loss = 0.5 DistilLoss + 0.2 MLMLoss + 0.3 CosineLoss*
 Dataset