gonzalez-agirre commited on
Commit
aad6128
1 Parent(s): ab4332b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -5
README.md CHANGED
@@ -67,13 +67,59 @@ widget:
67
 
68
  # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Question Answering.
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  The **roberta-base-ca-v2-cased-qa** is a Question Answering (QA) model for the Catalan language fine-tuned from the [roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
71
 
72
- ## Datasets
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  We used the QA dataset in Catalan called [CatalanQA](https://huggingface.co/datasets/projecte-aina/catalanqa) for training and evaluation, and the [XQuAD-ca](https://huggingface.co/datasets/projecte-aina/xquad-ca) test set for evaluation.
74
 
75
- ## Evaluation and results
76
- We evaluated the _roberta-base-ca-v2-cased-qa_ on the CatalanQA and XQuAD-ca test sets against standard multilingual and monolingual baselines:
 
 
 
 
 
 
 
 
77
 
78
 
79
  | Model | CatalanQA (F1/EM) | XQuAD-Ca (F1/EM) |
@@ -85,8 +131,11 @@ We evaluated the _roberta-base-ca-v2-cased-qa_ on the CatalanQA and XQuAD-ca tes
85
 
86
  For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/club).
87
 
 
88
 
89
- ## Citing
 
 
90
  If you use any of these resources (datasets or models) in your work, please cite our latest paper:
91
  ```bibtex
92
  @inproceedings{armengol-estape-etal-2021-multilingual,
@@ -111,4 +160,9 @@ If you use any of these resources (datasets or models) in your work, please cite
111
  ```
112
 
113
  ### Funding
114
- This work was funded by the [Catalan Government](https://politiquesdigitals.gencat.cat/en/inici/index.html) within the framework of the [AINA project.](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
 
 
 
 
 
 
67
 
68
  # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Question Answering.
69
 
70
+ ## Table of Contents
71
+ - [Model Description](#model-description)
72
+ - [Intended Uses and Limitations](#intended-uses-and-limitations)
73
+ - [How to Use](#how-to-use)
74
+ - [Training](#training)
75
+ - [Training Data](#training-data)
76
+ - [Training Procedure](#training-procedure)
77
+ - [Evaluation](#evaluation)
78
+ - [Variable and Metrics](#variable-and-metrics)
79
+ - [Evaluation Results](#evaluation-results)
80
+ - [Licensing Information](#licensing-information)
81
+ - [Citation Information](#citation-information)
82
+ - [Funding](#funding)
83
+ - [Contributions](#contributions)
84
+
85
+ ## Model description
86
+
87
  The **roberta-base-ca-v2-cased-qa** is a Question Answering (QA) model for the Catalan language fine-tuned from the [roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
88
 
89
+ ## Intended Uses and Limitations
90
+
91
+ **roberta-base-ca-v2-cased-qa** model can be used for extractive question answering. The model is limited by its training dataset and may not generalize well for all use cases.
92
+
93
+ ## How to Use
94
+
95
+ Here is how to use this model:
96
+
97
+ ```python
98
+ from transformers import pipeline
99
+
100
+ nlp = pipeline("question-answering", model="projecte-aina/roberta-base-ca-v2-cased-qa")
101
+ text = "Quan va començar el Super3?"
102
+ context = "El Super3 o Club Super3 és un univers infantil català creat a partir d'un programa emès per Televisió de Catalunya des del 1991. Està format per un canal de televisió, la revista Súpers!, la Festa dels Súpers i un club que té un milió i mig de socis."
103
+
104
+ qa_results = nlp(text, context)
105
+ print(qa_results)
106
+ ```
107
+
108
+ ## Training
109
+
110
+ ### Training data
111
  We used the QA dataset in Catalan called [CatalanQA](https://huggingface.co/datasets/projecte-aina/catalanqa) for training and evaluation, and the [XQuAD-ca](https://huggingface.co/datasets/projecte-aina/xquad-ca) test set for evaluation.
112
 
113
+ ### Training Procedure
114
+ The model was trained with a batch size of 16 and a learning rate of 5e-5 for 5 epochs. We then selected the best checkpoint using the downstream task metric in the corresponding development set and then evaluated it on the test set.
115
+
116
+ ## Evaluation
117
+
118
+ ### Variable and Metrics
119
+
120
+ This model was finetuned maximizing F1 score.
121
+
122
+ ### Evaluation resultsWe evaluated the _roberta-base-ca-v2-cased-qa_ on the CatalanQA and XQuAD-ca test sets against standard multilingual and monolingual baselines:
123
 
124
 
125
  | Model | CatalanQA (F1/EM) | XQuAD-Ca (F1/EM) |
 
131
 
132
  For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/club).
133
 
134
+ ## Licensing Information
135
 
136
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
137
+
138
+ ## Citation Information
139
  If you use any of these resources (datasets or models) in your work, please cite our latest paper:
140
  ```bibtex
141
  @inproceedings{armengol-estape-etal-2021-multilingual,
 
160
  ```
161
 
162
  ### Funding
163
+ This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/en/inici/index.html) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
164
+
165
+
166
+ ## Contributions
167
+
168
+ [N/A]