bsc-temu commited on
Commit
b184bd5
1 Parent(s): 31126f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -2
README.md CHANGED
@@ -24,7 +24,7 @@ datasets:
24
 
25
  metrics:
26
 
27
- - "???"
28
 
29
  widget:
30
 
@@ -40,4 +40,45 @@ widget:
40
 
41
  ---
42
 
43
- # Catalan RoBERTa-base trained on Catalan Textual Corpus fine-tuned for Text Classification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  metrics:
26
 
27
+ - "accuracy"
28
 
29
  widget:
30
 
 
40
 
41
  ---
42
 
43
+ # Catalan RoBERTa-base trained on Catalan Textual Corpus fine-tuned for Text Classification.
44
+
45
+ The **roberta-base-ca-cased-tc** is a Text Classification (TC) model for the Catalan language fine-tuned from the [BERTa](https://huggingface.co/PlanTL-GOB-ES/roberta-base-ca) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the BERTa model card for more details).
46
+
47
+ ## Datasets
48
+ We used the QA dataset in Catalan called [TeCla](https://huggingface.co/datasets/projecte-aina/viquiquad) for training and evaluation.
49
+
50
+ ## Evaluation and results
51
+ Below, the evaluation result on the TeCla test set:
52
+
53
+ | Task | TeCla (accuracy) |
54
+ | ------------|:----|
55
+ | BERTa | **74.04** |
56
+ For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/berta).
57
+
58
+ ## Citing
59
+ If you use any of these resources (datasets or models) in your work, please cite our latest paper:
60
+ ```bibtex
61
+ @inproceedings{armengol-estape-etal-2021-multilingual,
62
+ title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",
63
+ author = "Armengol-Estap{\'e}, Jordi and
64
+ Carrino, Casimiro Pio and
65
+ Rodriguez-Penagos, Carlos and
66
+ de Gibert Bonet, Ona and
67
+ Armentano-Oller, Carme and
68
+ Gonzalez-Agirre, Aitor and
69
+ Melero, Maite and
70
+ Villegas, Marta",
71
+ booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
72
+ month = aug,
73
+ year = "2021",
74
+ address = "Online",
75
+ publisher = "Association for Computational Linguistics",
76
+ url = "https://aclanthology.org/2021.findings-acl.437",
77
+ doi = "10.18653/v1/2021.findings-acl.437",
78
+ pages = "4933--4946",
79
+ }
80
+ ```
81
+ ## Funding
82
+ TODO
83
+ ## Disclaimer
84
+ TODO