unb-lamfo-nlp-mcti
/

NLP-ATS-MCTI

English

Summarization

5 papers

Model card Files Files and versions Community

igorgavi commited on Dec 12, 2022

Commit

534a51a

•

1 Parent(s): 824dfb9

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -13

README.md CHANGED Viewed

@@ -39,19 +39,23 @@ and English.
 ## Model description
-This Automatic Text Summarizarion (ATS) Model was developed to be applied to the Research Financing Products Portfolio (FPP)
-of the Brazilian Ministry of Science, Technology and Innovation. It was produced in parallel with the writing of a Sistematic
-Literature Review paper, in which there is a discussion concerning many summarization methods, datasets, and evaluators as well
-as a brief overview of the nature of the task itself and the state-of-the-art of its implementation.
-The input of the model can be either a single text or a csv file containing multiple texts (in the English language) and its output are the summarized texts
-and their evaluation metrics. As an optional (although recommended) input, the model accepts gold-standard summaries for the texts,
-i.e., human written (or extracted) summaries of the texts which are considered to be good representations of their contents. Evaluators
-like ROUGE, which in its many variations is the most used to perform the task, require gold-standard summaries as inputs. There are, however,
-Evaluation Methods which do not deppend on the existence of a golden-summary (e.g. the cosine similarity method, the Kullback Leibler Divergence method)
-and this is why an evaluation can be made even when only the text is taken as an input to the model.

 ## Model description
+This Automatic Text Summarizarion (ATS) Model was developed in the Python language to be applied to the Research Financing Products
+Portfolio (FPP) of the Brazilian Ministry of Science, Technology and Innovation. It was produced in parallel with the writing of a
+Sistematic Literature Review paper, in which there is a discussion concerning many summarization methods, datasets, and evaluators
+as well as a brief overview of the nature of the task itself and the state-of-the-art of its implementation.
+The input of the model can be either a single text, a dataframe or a csv file containing multiple texts (in the English language) and its output
+are the summarized texts and their evaluation metrics. As an optional (although recommended) input, the model accepts gold-standard summaries
+for the texts, i.e., human written (or extracted) summaries of the texts which are considered to be good representations of their contents.
+Evaluators like ROUGE, which in its many variations is the most used to perform the task, require gold-standard summaries as inputs. There are,
+however, Evaluation Methods which do not deppend on the existence of a golden-summary (e.g. the cosine similarity method, the Kullback Leibler
+Divergence method) and this is why an evaluation can be made even when only the text is taken as an input to the model.
+The text output is produced by a chosen method of ATS which can be extractive (built with the most relevant sentences of the source document)
+or abstractive (written from scratch in an abstractive manner). The latter is achieved by means of transformers, and the ones present in the
+model are the already existing and vastly applied BART-Large CNN, Pegasus-XSUM and mT5 Multilingual XLSUM. The extractive methods are taken from
+the Sumy Python Library and include SumyRandom, SumyLuhn, SumyLsa, SumyLexRank, SumyTextRank, SumySumBasic, SumyKL and SumyReduction. Each of the
+methods used for text summarization will be described indvidually in the following sections.