Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ The parameter size for the model is 101M.
|
|
29 |
The model is trained using BART noising techniques like sentence permutation, token deletion, and random token masking.
|
30 |
<br>The noisy data is fed into the encoder of the transformer and the denoising task/ objective is fulfilled by the decoder of the transformer model.
|
31 |
|
32 |
-
|
33 |
|
34 |
The Loss for pre-training is as follows:
|
35 |
|
|
|
29 |
The model is trained using BART noising techniques like sentence permutation, token deletion, and random token masking.
|
30 |
<br>The noisy data is fed into the encoder of the transformer and the denoising task/ objective is fulfilled by the decoder of the transformer model.
|
31 |
|
32 |
+
Cross-entropy loss is used for both the pre-training and fine-tuning of the model.
|
33 |
|
34 |
The Loss for pre-training is as follows:
|
35 |
|