bsc-temu
commited on
Commit
•
e712d6c
1
Parent(s):
2e91339
update readme
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ The publicly available corpora are:
|
|
24 |
|
25 |
2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
|
26 |
|
27 |
-
3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus
|
28 |
a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
|
29 |
|
30 |
4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
|
@@ -61,8 +61,6 @@ The training lasted a total of 48 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|
|
61 |
The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
|
62 |
that has been created along with the model.
|
63 |
|
64 |
-
_Note that the fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library
|
65 |
-
|
66 |
It contains the following tasks and their related datasets:
|
67 |
|
68 |
1. Part-of-Speech Tagging (POS)
|
@@ -100,6 +98,7 @@ Here are the train/dev/test splits of the datasets:
|
|
100 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
101 |
|
102 |
|
|
|
103 |
|
104 |
## Results
|
105 |
|
|
|
24 |
|
25 |
2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
|
26 |
|
27 |
+
3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus \\\\cite{suarez2019asynchronous},
|
28 |
a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
|
29 |
|
30 |
4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
|
|
|
61 |
The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
|
62 |
that has been created along with the model.
|
63 |
|
|
|
|
|
64 |
It contains the following tasks and their related datasets:
|
65 |
|
66 |
1. Part-of-Speech Tagging (POS)
|
|
|
98 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
99 |
|
100 |
|
101 |
+
_The fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library_
|
102 |
|
103 |
## Results
|
104 |
|