BSC-LT
/

roberta-base-ca

Inference Endpoints

Model card Files Files and versions Community

bsc-temu commited on May 21, 2021

Commit

e712d6c

•

1 Parent(s): 2e91339

update readme

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ The publicly available corpora are:
  2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
- 3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus \\cite{suarez2019asynchronous},
     a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
  4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
@@ -61,8 +61,6 @@ The training lasted a total of 48 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
 The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
 that has been created along with the model.
-_Note that the fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library
 It contains the following tasks and their related datasets:
  1. Part-of-Speech Tagging (POS)
@@ -100,6 +98,7 @@ Here are the train/dev/test splits of the datasets:
 | QA (ViquiQuAD) | 14,239  | 11,255  | 1,492  | 1,429 |
 ## Results

  2. the [Catalan Open Subtitles](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.ca.gz), a collection of translated movie subtitles
+ 3. the non-shuffled version of the Catalan part of the [OSCAR](https://traces1.inria.fr/oscar/) corpus \\\\cite{suarez2019asynchronous},
     a collection of monolingual corpora, filtered from [Common Crawl](https://commoncrawl.org/about/)
  4. The [CaWac](http://nlp.ffzg.hr/resources/corpora/cawac/) corpus, a web corpus of Catalan built from the .cat top-level-domain in late 2013
 The BERTa model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB),
 that has been created along with the model.
 It contains the following tasks and their related datasets:
  1. Part-of-Speech Tagging (POS)
 | QA (ViquiQuAD) | 14,239  | 11,255  | 1,492  | 1,429 |
+_The fine-tuning on downstream tasks have been performed with the HuggingFace [**Transformers**](https://github.com/huggingface/transformers) library_
 ## Results