eduagarcia commited on
Commit
19c4080
1 Parent(s): 338b0a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -101,7 +101,7 @@ RoBERTaLexPT-base is a Portuguese Masked Language Model pretrained from scratch
101
 
102
  ## Evaluation
103
 
104
- The model was evaluated on ["PortuLex" benchmark](eduagarcia/PortuLex_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
105
 
106
  Macro F1-Score (\%) for multiple models evaluated on PortuLex benchmark test splits:
107
 
@@ -120,9 +120,9 @@ Macro F1-Score (\%) for multiple models evaluated on PortuLex benchmark test spl
120
  | [Legal-RoBERTa-PT-large](https://huggingface.co/joelniklaus/legal-portuguese-roberta-large) | 87.96 | 88.32/84.83 | 79.57 | 81.98 | 84.02 |
121
  | **Ours** | | | | | |
122
  | RoBERTaTimbau-base (Reproduction of BERTimbau) | 89.68 | 87.53/85.74 | 78.82 | 82.03 | 84.29 |
123
- | RoBERTaLegalPT-base (Trained on LegalPT) | 90.59 | 85.45/84.40 | 79.92 | 82.84 | 84.57 |
124
  | RoBERTaCrawlPT-base (Trained on CrawlPT) | 89.24 | 88.22/86.58 | 79.88 | 82.80 | 84.83 |
125
- | **RoBERTaLexPT-base** (Trained on CrawlPT + LegalPT) | **90.73** | **88.56**/86.03 | **80.40** | 83.22 | **85.41** |
126
 
127
  In summary, RoBERTaLexPT consistently achieves top legal NLP effectiveness despite its base size.
128
  With sufficient pre-training data, it can surpass larger models. The results highlight the importance of domain-diverse training data over sheer model scale.
 
101
 
102
  ## Evaluation
103
 
104
+ The model was evaluated on ["PortuLex" benchmark](https://huggingface.co/eduagarcia/PortuLex_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
105
 
106
  Macro F1-Score (\%) for multiple models evaluated on PortuLex benchmark test splits:
107
 
 
120
  | [Legal-RoBERTa-PT-large](https://huggingface.co/joelniklaus/legal-portuguese-roberta-large) | 87.96 | 88.32/84.83 | 79.57 | 81.98 | 84.02 |
121
  | **Ours** | | | | | |
122
  | RoBERTaTimbau-base (Reproduction of BERTimbau) | 89.68 | 87.53/85.74 | 78.82 | 82.03 | 84.29 |
123
+ | [RoBERTaLegalPT-base](https://huggingface.co/eduagarcia/RoBERTaCrawlPT-base) (Trained on LegalPT) | 90.59 | 85.45/84.40 | 79.92 | 82.84 | 84.57 |
124
  | RoBERTaCrawlPT-base (Trained on CrawlPT) | 89.24 | 88.22/86.58 | 79.88 | 82.80 | 84.83 |
125
+ | **RoBERTaLexPT-base (this)** (Trained on CrawlPT + LegalPT) | **90.73** | **88.56**/86.03 | **80.40** | 83.22 | **85.41** |
126
 
127
  In summary, RoBERTaLexPT consistently achieves top legal NLP effectiveness despite its base size.
128
  With sufficient pre-training data, it can surpass larger models. The results highlight the importance of domain-diverse training data over sheer model scale.