bsc-temu commited on
Commit
e712d6c
1 Parent(s): 2e91339

update readme

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -24,7 +24,7 @@ The publicly available corpora are:
24
 
25
  2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
26
 
27
- 3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus \\cite{suarez2019asynchronous},
28
  a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
29
 
30
  4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
@@ -61,8 +61,6 @@ The training lasted a total of 48 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
61
  The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
62
  that has been created along with the model.
63
 
64
- _Note that the fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library
65
-
66
  It contains the following tasks and their related datasets:
67
 
68
  1. Part-of-Speech Tagging (POS)
@@ -100,6 +98,7 @@ Here are the train/dev/test splits of the datasets:
100
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
101
 
102
 
 
103
 
104
  ## Results
105
 
 
24
 
25
  2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
26
 
27
+ 3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus \\\\cite{suarez2019asynchronous},
28
  a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
29
 
30
  4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
 
61
  The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
62
  that has been created along with the model.
63
 
 
 
64
  It contains the following tasks and their related datasets:
65
 
66
  1. Part-of-Speech Tagging (POS)
 
98
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
99
 
100
 
101
+ _The fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library_
102
 
103
  ## Results
104