asier-gutierrez commited on
Commit
2a1c89f
1 Parent(s): a2a5b48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -21,13 +21,21 @@ widget:
21
 
22
  # Spanish Legal-domain RoBERTa
23
 
24
- There are two main models made specifically for the Spanish language, the BETO model and a GPT-2. There is also a multilingual BERT (mBERT) that is often used as it might be better sometimes.
25
-
26
- Both BETO and GPT-2 models for Spanish have been trained with rather low resources, 4GB and 3GB of data respectively. The data used for training both models might be various but the amount is not enough to cover all domains. Furthermore, training a BERT-like domain-specific model is better as it effectively covers the vocabulary and understands the legal jargon. We present our model trained on 9GB that are specifically of the legal domain.
27
 
28
  ## Citing
29
  ```
30
- TBA
 
 
 
 
 
 
 
31
  ```
32
 
33
- For more information visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-legal-es)
 
 
 
 
21
 
22
  # Spanish Legal-domain RoBERTa
23
 
24
+ There are few models trained for the Spanish language. Some of the models have been trained with a low resource, unclean corpora. The ones derived from the Spanish National Plan for Language Technologies are proficient solving several tasks and have been trained using large scale clean corpora. However, the Spanish Legal domain language could be think of an independent language on its own. We therefore created a Spanish Legal model from scratch trained exclusively on legal corpora.
 
 
25
 
26
  ## Citing
27
  ```
28
+ @misc{gutierrezfandino2021legal,
29
+ title={Spanish Legalese Language Model and Corpora},
30
+ author={Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Aitor Gonzalez-Agirre and Marta Villegas},
31
+ year={2021},
32
+ eprint={2110.12201},
33
+ archivePrefix={arXiv},
34
+ primaryClass={cs.CL}
35
+ }
36
  ```
37
 
38
+ For more information visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-legal-es)
39
+
40
+ ## Funding
41
+ This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL.