Mihai-Dan MAŞALA (25095) commited on
Commit
0dc011a
1 Parent(s): 5fb80fd

Update README

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -54,7 +54,7 @@ outputs = model(**inputs)
54
  The model is trained on the following compilation of corpora. Note that we present the statistics after the cleaning process.
55
 
56
  | Corpus | Words | Sentences | Size (GB)|
57
- |-----------|-----------|-----------|----------|
58
  | Oscar | 1.78B | 87M | 10.8 |
59
  | RoTex | 240M | 14M | 1.5 |
60
  | RoWiki | 50M | 2M | 0.3 |
@@ -68,7 +68,7 @@ The model is trained on the following compilation of corpora. Note that we prese
68
  We report Macro-averaged F1 score (in %)
69
 
70
  | Model | Dev | Test |
71
- | -----------------|----------|----------|
72
  | multilingual-BERT| 68.96 | 69.57 |
73
  | XLM-R-base | 71.26 | 71.71 |
74
  | BERT-base-ro | 70.49 | 71.02 |
@@ -80,8 +80,8 @@ We report Macro-averaged F1 score (in %)
80
 
81
  We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/campaign) Moldavian vs. Romanian Cross-dialect Topic identification Challenge, as Macro-averaged F1 score (in %).
82
 
83
- | Model | Dialect Classification | MD to RO | RO to MD|
84
- |-------------------|------------------------|----------|----------|
85
  | 2-CNN + SVM | 93.40 | 65.09 | 75.21 |
86
  | Char+Word SVM | 96.20 | 69.08 | 81.93 |
87
  | BiGRU | 93.30 | **70.10**| 80.30 |
@@ -97,7 +97,7 @@ We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/ca
97
  Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We report results on the official test set, as accuracies in %.
98
 
99
  | Model | word level | char level |
100
- |-----------------------------|------------|------------|
101
  | BiLSTM | 99.42 | - |
102
  | CharCNN | 98.40 | 99.65 |
103
  | CharCNN + multilingual-BERT | 99.72 | 99.94 |
@@ -114,7 +114,7 @@ Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We re
114
  @inproceedings{RoBERT,
115
  title={RoBERT – A Romanian BERT Model},
116
  author={Masala, Mihai and Ruseti, Stefan and Dascalu, Mihai,
117
- booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
118
  year={2020}
119
  }
120
  ```
54
  The model is trained on the following compilation of corpora. Note that we present the statistics after the cleaning process.
55
 
56
  | Corpus | Words | Sentences | Size (GB)|
57
+ |-----------|:---------:|:---------:|:--------:|
58
  | Oscar | 1.78B | 87M | 10.8 |
59
  | RoTex | 240M | 14M | 1.5 |
60
  | RoWiki | 50M | 2M | 0.3 |
68
  We report Macro-averaged F1 score (in %)
69
 
70
  | Model | Dev | Test |
71
+ |------------------|:--------:|:--------:|
72
  | multilingual-BERT| 68.96 | 69.57 |
73
  | XLM-R-base | 71.26 | 71.71 |
74
  | BERT-base-ro | 70.49 | 71.02 |
80
 
81
  We report results on [VarDial 2019](https://sites.google.com/view/vardial2019/campaign) Moldavian vs. Romanian Cross-dialect Topic identification Challenge, as Macro-averaged F1 score (in %).
82
 
83
+ | Model | Dialect Classification | MD to RO | RO to MD |
84
+ |-------------------|:----------------------:|:--------:|:--------:|
85
  | 2-CNN + SVM | 93.40 | 65.09 | 75.21 |
86
  | Char+Word SVM | 96.20 | 69.08 | 81.93 |
87
  | BiGRU | 93.30 | **70.10**| 80.30 |
97
  Challenge can be found [here](https://diacritics-challenge.speed.pub.ro/). We report results on the official test set, as accuracies in %.
98
 
99
  | Model | word level | char level |
100
+ |-----------------------------|:----------:|:----------:|
101
  | BiLSTM | 99.42 | - |
102
  | CharCNN | 98.40 | 99.65 |
103
  | CharCNN + multilingual-BERT | 99.72 | 99.94 |
114
  @inproceedings{RoBERT,
115
  title={RoBERT – A Romanian BERT Model},
116
  author={Masala, Mihai and Ruseti, Stefan and Dascalu, Mihai,
117
+ booktitle={Proceedings of the 28th International Conference on Computational Linguistics (COLING)},
118
  year={2020}
119
  }
120
  ```