readerbench
/

RoBERT-small

Transformers PyTorch TensorFlow JAX bert Inference Endpoints

Model card Files Files and versions Community

Mihai-Dan MAŞALA (25095) commited on Dec 4, 2020

Commit

0dc011a

•

1 Parent(s): 5fb80fd

Update README

Browse files

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -54,7 +54,7 @@ outputs = model(**inputs)
 The model is trained on the following compilation of corpora. Note that we present the statistics after the cleaning process.
 | Corpus    | Words     | Sentences | Size (GB)|
-|-----------|-----------|-----------|----------|
 | Oscar     | 1.78B     | 87M       | 10.8     |
 | RoTex     | 240M      | 14M       | 1.5      |
 | RoWiki    | 50M       | 2M        | 0.3      |
@@ -68,7 +68,7 @@ The model is trained on the following compilation of corpora. Note that we prese
 We report Macro-averaged F1 score (in %)
 | Model            | Dev      | Test     |
-| -----------------|----------|----------|
 | multilingual-BERT| 68.96    | 69.57    |
 | XLM-R-base       | 71.26    | 71.71    |
 | BERT-base-ro     | 70.49    | 71.02    |
@@ -80,8 +80,8 @@ We report Macro-averaged F1 score (in %)
 We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/campaign) Moldavian vs. Romanian Cross-dialect Topic identification Challenge, as Macro-averaged F1 score (in %).
-| Model             | Dialect Classification | MD to RO | RO to MD|
-|-------------------|------------------------|----------|----------|
 | 2-CNN + SVM       | 93.40                  | 65.09    | 75.21    |
 | Char+Word SVM     | 96.20                  | 69.08    | 81.93    |
 | BiGRU             | 93.30                  | **70.10**| 80.30    |
@@ -97,7 +97,7 @@ We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/ca
 Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We report results on the official test set, as accuracies in %.
 | Model                       | word level | char level |
-|-----------------------------|------------|------------|
 | BiLSTM                      | 99.42      | -          |
 | CharCNN                     | 98.40      | 99.65      |
 | CharCNN + multilingual-BERT | 99.72      | 99.94      |
@@ -114,7 +114,7 @@ Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We re
 @inproceedings{RoBERT,
   title={RoBERT – A Romanian BERT Model},
   author={Masala, Mihai and Ruseti, Stefan and Dascalu, Mihai,
-  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
   year={2020}
 }
 ```

 The model is trained on the following compilation of corpora. Note that we present the statistics after the cleaning process.
 | Corpus    | Words     | Sentences | Size (GB)|
+|-----------|:---------:|:---------:|:--------:|
 | Oscar     | 1.78B     | 87M       | 10.8     |
 | RoTex     | 240M      | 14M       | 1.5      |
 | RoWiki    | 50M       | 2M        | 0.3      |
 We report Macro-averaged F1 score (in %)
 | Model            | Dev      | Test     |
+|------------------|:--------:|:--------:|
 | multilingual-BERT| 68.96    | 69.57    |
 | XLM-R-base       | 71.26    | 71.71    |
 | BERT-base-ro     | 70.49    | 71.02    |
 We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/campaign) Moldavian vs. Romanian Cross-dialect Topic identification Challenge, as Macro-averaged F1 score (in %).
+| Model             | Dialect Classification | MD to RO | RO to MD |
+|-------------------|:----------------------:|:--------:|:--------:|
 | 2-CNN + SVM       | 93.40                  | 65.09    | 75.21    |
 | Char+Word SVM     | 96.20                  | 69.08    | 81.93    |
 | BiGRU             | 93.30                  | **70.10**| 80.30    |
 Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We report results on the official test set, as accuracies in %.
 | Model                       | word level | char level |
+|-----------------------------|:----------:|:----------:|
 | BiLSTM                      | 99.42      | -          |
 | CharCNN                     | 98.40      | 99.65      |
 | CharCNN + multilingual-BERT | 99.72      | 99.94      |
 @inproceedings{RoBERT,
   title={RoBERT – A Romanian BERT Model},
   author={Masala, Mihai and Ruseti, Stefan and Dascalu, Mihai,
+  booktitle={Proceedings of the 28th International Conference on Computational Linguistics (COLING)},
   year={2020}
 }
 ```