unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

Model card Files Files and versions Community

MarcosDib commited on Dec 6, 2022

Commit

f08495b

•

1 Parent(s): d6b86d1

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -62,8 +62,8 @@ With the motivation to increase accuracy obtained with baseline implementation,
 strategy under the assumption that small data available for training was insufficient for adequate embedding training.
 In this context, we considered two approaches:
-    i) pre-training wordembeddings using similar datasets for text classification;
-    ii) using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
 XXXX has originally been released in base and large variations, for cased and uncased input text. The uncased models
 also strips out an accent markers. Chinese and multilingual uncased and cased versions followed shortly after.
@@ -82,6 +82,14 @@ The detailed release history can be found on the [here](https://huggingface.co/u
 | [`mcti-large-cased`]         | 110M    | Chinese  |
 | [`-base-multilingual-cased`] | 110M    | Multiple |
 ## Intended uses
 You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to

 strategy under the assumption that small data available for training was insufficient for adequate embedding training.
 In this context, we considered two approaches:
+   i) pre-training wordembeddings using similar datasets for text classification;
+   ii) using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
 XXXX has originally been released in base and large variations, for cased and uncased input text. The uncased models
 also strips out an accent markers. Chinese and multilingual uncased and cased versions followed shortly after.
 | [`mcti-large-cased`]         | 110M    | Chinese  |
 | [`-base-multilingual-cased`] | 110M    | Multiple |
+  | Dataset              | Compatibility to base* |
+  |----------------------|------------------------|
+  | Labeled MCTI         | 100%                   |
+  | Full MCTI            | 100%                   |
+  | BBC News Articles    | 56.77%                 |
+  | New unlabeled MCTI   | 75.26%                 |
 ## Intended uses
 You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to