gonzalez-agirre commited on
Commit
abbe4e5
1 Parent(s): 484c277

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
 
12
  - "masked-lm"
13
 
14
- - "RoBERTa-large-ca"
15
 
16
  - "CaText"
17
 
@@ -29,7 +29,7 @@ widget:
29
 
30
  ---
31
 
32
- # Catalan BERTa (roberta-large-ca) large model
33
 
34
  ## Table of Contents
35
  <details>
@@ -53,13 +53,13 @@ widget:
53
 
54
  ## Model description
55
 
56
- The **roberta-large-ca** is a transformer-based masked language model for the Catalan language.
57
  It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) large model
58
  and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
59
 
60
  ## Intended Uses and Limitations
61
 
62
- **roberta-large-ca** model is ready-to-use only for masked language modeling to perform the Fill Mask task (try the inference API or read the next section).
63
  However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
64
 
65
  ## How to Use
@@ -70,8 +70,8 @@ Here is how to use this model:
70
  from transformers import AutoModelForMaskedLM
71
  from transformers import AutoTokenizer, FillMaskPipeline
72
  from pprint import pprint
73
- tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-large-ca')
74
- model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-large-ca')
75
  model.eval()
76
  pipeline = FillMaskPipeline(model, tokenizer_hf)
77
  text = f"Em dic <mask>."
@@ -171,7 +171,7 @@ Here are the train/dev/test splits of the datasets:
171
 
172
  | Task | NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
173
  | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
174
- | RoBERTa-large-ca | **89.82** | **99.02** | **83.41** | **75.46** | **83.61** | **89.34/75.50** | **89.20**/75.77 | **90.72/79.06** | **73.79**/55.34 |
175
  | RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/**75.91** | 89.50/76.63 | 73.64/**55.42** |
176
  | BERTa | 89.76 | 98.96 | 80.19 | 73.65 | 79.26 | 85.93/70.58 | 87.12/73.11 | 89.17/77.14 | 69.20/51.47 |
177
  | mBERT | 86.87 | 98.83 | 74.26 | 69.90 | 74.63 | 82.78/67.33 | 86.89/73.53 | 86.90/74.19 | 68.79/50.80 |
 
11
 
12
  - "masked-lm"
13
 
14
+ - "RoBERTa-large-ca-v2"
15
 
16
  - "CaText"
17
 
 
29
 
30
  ---
31
 
32
+ # Catalan BERTa (roberta-large-ca-v2) large model
33
 
34
  ## Table of Contents
35
  <details>
 
53
 
54
  ## Model description
55
 
56
+ The **roberta-large-ca-v2** is a transformer-based masked language model for the Catalan language.
57
  It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) large model
58
  and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
59
 
60
  ## Intended Uses and Limitations
61
 
62
+ **roberta-large-ca-v2** model is ready-to-use only for masked language modeling to perform the Fill Mask task (try the inference API or read the next section).
63
  However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
64
 
65
  ## How to Use
 
70
  from transformers import AutoModelForMaskedLM
71
  from transformers import AutoTokenizer, FillMaskPipeline
72
  from pprint import pprint
73
+ tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-large-ca-v2')
74
+ model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-large-ca-v2')
75
  model.eval()
76
  pipeline = FillMaskPipeline(model, tokenizer_hf)
77
  text = f"Em dic <mask>."
 
171
 
172
  | Task | NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
173
  | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
174
+ | RoBERTa-large-ca-v2 | **89.82** | **99.02** | **83.41** | **75.46** | **83.61** | **89.34/75.50** | **89.20**/75.77 | **90.72/79.06** | **73.79**/55.34 |
175
  | RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/**75.91** | 89.50/76.63 | 73.64/**55.42** |
176
  | BERTa | 89.76 | 98.96 | 80.19 | 73.65 | 79.26 | 85.93/70.58 | 87.12/73.11 | 89.17/77.14 | 69.20/51.47 |
177
  | mBERT | 86.87 | 98.83 | 74.26 | 69.90 | 74.63 | 82.78/67.33 | 86.89/73.53 | 86.90/74.19 | 68.79/50.80 |