unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

Model card Files Files and versions Community

MarcosDib commited on Dec 6, 2022

Commit

d6b86d1

•

1 Parent(s): e8d1c96

Update README.md

Files changed (1) hide show

README.md +15 -8

README.md CHANGED Viewed

@@ -58,6 +58,13 @@ https://github.com/Marcosdib/S2Query/Classification_Architecture_model.png
 ## Model variations
 XXXX has originally been released in base and large variations, for cased and uncased input text. The uncased models
 also strips out an accent markers. Chinese and multilingual uncased and cased versions followed shortly after.
 Modified preprocessing with whole word masking has replaced subpiece masking in a following work, with the release of
@@ -65,15 +72,15 @@ two models.
 Other 24 smaller models are released afterward.
-The detailed release history can be found on the [google-research/bert readme](https://www.google.com) on github.
-| Model | #params | Language |
-|------------------------|--------------------------------|-------|
-| [`mcti-base-uncased`]| 110M   | English |
-| [`mcti-large-uncased`]| 340M    | English | sub
-| [`mcti-base-cased`]|        | 110M    | English |
-| [`mcti-large-cased`] | 110M    | Chinese |
-| [`-base-multilingual-cased`] | 110M | Multiple |
 ## Intended uses

 ## Model variations
+With the motivation to increase accuracy obtained with baseline implementation, we implemented a transfer learning
+strategy under the assumption that small data available for training was insufficient for adequate embedding training.
+In this context, we considered two approaches:
+    i) pre-training wordembeddings using similar datasets for text classification;
+    ii) using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
 XXXX has originally been released in base and large variations, for cased and uncased input text. The uncased models
 also strips out an accent markers. Chinese and multilingual uncased and cased versions followed shortly after.
 Modified preprocessing with whole word masking has replaced subpiece masking in a following work, with the release of
 Other 24 smaller models are released afterward.
+The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
+| Model                        | #params | Language |
+|------------------------------|--------------------|-------|
+| [`mcti-base-uncased`]        | 110M    | English  |
+| [`mcti-large-uncased`]       | 340M    | English  | sub
+| [`mcti-base-cased`]          | 110M    | English  |
+| [`mcti-large-cased`]         | 110M    | Chinese  |
+| [`-base-multilingual-cased`] | 110M    | Multiple |
 ## Intended uses