pascalrai
/

nep-summ-BART

Text2Text Generation

nepali text summary

Inference Endpoints

Model card Files Files and versions Community

pascalrai commited on Feb 19

Commit

b894e3b

•

1 Parent(s): 1b6ed0e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -92,7 +92,7 @@ The model was pre-trained continuously on a single A10G GPU in an AWS instance f
 #### Possible Future Directions:
 1. Use a decoder-only model for pre-training and summarization.
-<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder to decoder generation.
 <br>Thus, hurts the performance of the Abstractive Summarization task.
 <br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.

 #### Possible Future Directions:
 1. Use a decoder-only model for pre-training and summarization.
+<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation.
 <br>Thus, hurts the performance of the Abstractive Summarization task.
 <br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.