unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions Community

MarcosDib commited on Dec 15, 2022

Commit

08ac5e2

•

1 Parent(s): 6270a7a

Update README.md

Browse files

Files changed (1) hide show

README.md +66 -16

README.md CHANGED Viewed

@@ -32,8 +32,8 @@ models were applied to improve the understanding of each sentence.
 ## According to the abstract,
-Compared to the 81% baseline accuracy rate based on available datasets and the 85% accuracy rate achieved using a
-Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 93%, according to
 ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
 ## Model description
@@ -60,20 +60,55 @@ redundancies in the analysis of the inputs.
 ## Model variations
-With the motivation to increase accuracy obtained with baseline implementation, was implemented a transfer learning
-strategy under the assumption that small data available for training was insufficient for adequate embedding training.
-In this context, was considered two approaches:
-   - Pre-training word embeddings using similar datasets for text classification;
-   - Using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
-Templates using Word2Vec and Longformer also need to be loaded and their weights are as follows:
-Table 1: Templates using Word2Vec and Longformer
-| Tamplates                    | weights |
-|------------------------------|:-------:|
-| Longformer                   | 10.9GB  |
-| Word2Vec                     | 56.1MB  |
 ## Intended uses
@@ -265,6 +300,21 @@ The results obtained surpassed those achieved in goal 6 and goal 9, with the bes
 in the longformer + CNN model. We can also observe that the models that achieved the best results were those
 that used the CNN network for deep learning.
 In addition, it was possible to notice that the model of longformer + SNN and longformer + LSTM were not able
 to learn. Perhaps the models need some adjustments, but each training attempt took between 5 and 8 hours, which
 made it impossible to try to adjust when other models were already showing promising results.

 ## According to the abstract,
+"The research results serve as a successful case of artificial intelligence in a federal government application".
+More details about the project, architecture model, training model and classifications process can be found in the article
 ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
 ## Model description
 ## Model variations
+Table x below presents the results of several implementations with different architectures, highlighting the
+accuracy, f1-score, recall and precision results obtained in the training of each network.
+Table 1: Results of experiments
+| Model                  | Accuracy | F1-score | Recall | Precision |
+|------------------------|----------|----------|--------|-----------|
+| Keras Embedding + SNN  |    92.47 |    88.46 |  79.66 |    100.00 |
+| Keras Embedding + DNN  |    89.78 |    84.41 |  77.81 |     92.57 |
+| Keras Embedding + CNN  |    93.01 |    89.91 |  85.18 |     95.69 |
+| Keras Embedding + LSTM |    93.01 |    88.94 |  83.32 |     95.54 |
+| Word2Vec + SNN         |    89.25 |    83.82 |  74.15 |     97.10 |
+| Word2Vec + DNN         |    90.32 |    86.52 |  85.18 |     88.70 |
+| Word2Vec + CNN         |    92.47 |    88.42 |  80.85 |     98.72 |
+| Word2Vec + LSTM        |    89.78 |    84.36 |  75.36 |     95.81 |
+| Longformer + SNN       |    61.29 |        0 |      0 |         0 |
+| Longformer + DNN       |    91.93 |    87.62 |  80.37 |     97.62 |
+| Longformer + CNN       |    94.09 |    90.69 |  83.41 |    100.00 |
+| Longformer + LSTM      |    61.29 |        0 |      0 |         0 |
+Table 2 bellow shows the times required for training each epoch, the data validation execution time and the weight of the deep learning
+model associated with each implementation.
+Table 1: Results of experiments
+| Model                  |  Training time epoch(s) | Validation time (s) | Weight(MB) |
+|------------------------|:-----------------------:|:-------------------:|:----------:|
+| Keras Embedding + SNN  |   100.00 |      0.2 |      0.7 |      1.8 |
+| Keras Embedding + DNN  |    92.57 |      1.0 |      1.4 |      7.6 |
+| Keras Embedding + CNN  |    95.69 |      0.4 |      1.1 |      3.2 |
+| Keras Embedding + LSTM |    95.54 |      1.4 |      2.0 |      1.8 |
+| Word2Vec + SNN         |    97.10 |      1.4 |      1.2 |      9.6 |
+| Word2Vec + DNN         |    88.70 |      2.0 |      6.8 |      7.8 |
+| Word2Vec + CNN         |    98.72 |      1.9 |      3.4 |      4.7 |
+| Word2Vec + LSTM        |    95.81 |      2.6 |     14.3 |      1.2 |
+| Longformer + SNN       |        0 |    128.0 |      1.5 |     36.8 |
+| Longformer + DNN       |    97.62 |     81.0 |      8.4 |     12.7 |
+| Longformer + CNN       |   100.00 |     57.0 |      4.5 |      9.6 |
+| Longformer + LSTM      |        0 |     13.0 |      8.6 |      2.6 |
+In addition, it is possible to notice that the model of Longformer + SNN and Longformer + LSTM were not able
+to learn. Perhaps the models need some adjustments, however, each training attempt takes between 5 and 8 hours,
+which made the attempt to adjust unfeasible in view of other models already showing promising results.
+With Longformer the problems caused by the size of the model became more visible. First, it was necessary to
+actively deallocate unused chunks of memory right after use so that the next steps could be loaded. Then, it
+was necessary to use a CPU environment for training the networks because the weight of the model exceeded the
+16GB of video memory available on the P100 board, available in Colab during training. In this case, the high
+RAM environment was used, which delivers 25GB of memory for use with the CPU, and this means a longer time
+required for training since the GPU performs matrix operations faster then a CPU. These models were trained
+5x with 100 training epochs each.
 ## Intended uses
 in the longformer + CNN model. We can also observe that the models that achieved the best results were those
 that used the CNN network for deep learning.
+With the motivation to increase accuracy obtained with baseline implementation, was implemented a transfer learning
+strategy under the assumption that small data available for training was insufficient for adequate embedding training.
+In this context, was considered two approaches:
+   - Pre-training word embeddings using similar datasets for text classification;
+   - Using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
+Templates using Word2Vec and Longformer also need to be loaded and their weights are as follows:
+Table 1: Templates using Word2Vec and Longformer
+| Tamplates                    | weights |
+|+----------------------------+|:-------:|
+| Longformer                   | 10.9GB  |
+| Word2Vec                     | 56.1MB  |
 In addition, it was possible to notice that the model of longformer + SNN and longformer + LSTM were not able
 to learn. Perhaps the models need some adjustments, but each training attempt took between 5 and 8 hours, which
 made it impossible to try to adjust when other models were already showing promising results.