Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ The training for the distilled model (student model) is designed to be the close
|
|
22 |
|
23 |
The final loss function is a combination of these three losses functions. We use the following ponderation:
|
24 |
|
25 |
-
$$Loss = 0.5 \times DistilLoss + 0.3 \times CosineLoss
|
26 |
|
27 |
Dataset
|
28 |
-------
|
|
|
22 |
|
23 |
The final loss function is a combination of these three losses functions. We use the following ponderation:
|
24 |
|
25 |
+
$$Loss = 0.5 \times DistilLoss + 0.3 \times CosineLoss + 0.2 \times MLMLoss$$
|
26 |
|
27 |
Dataset
|
28 |
-------
|