ccasimiro commited on
Commit
ae3116f
1 Parent(s): bf095a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -13
README.md CHANGED
@@ -16,19 +16,6 @@ widget:
16
  # Biomedical language model for Spanish
17
  Biomedical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
18
 
19
- ## BibTeX citation
20
- If you use any of these resources (datasets or models) in your work, please cite our latest paper:
21
-
22
- ```bibtex
23
- @misc{carrino2021biomedical,
24
- title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
25
- author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
26
- year={2021},
27
- eprint={2109.03570},
28
- archivePrefix={arXiv},
29
- primaryClass={cs.CL}
30
- }
31
- ```
32
 
33
  ## Tokenization and model pretraining
34
 
@@ -92,6 +79,37 @@ The model is ready-to-use only for masked language modelling to perform the Fill
92
 
93
  However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ---
96
 
97
  ## How to use
 
16
  # Biomedical language model for Spanish
17
  Biomedical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Tokenization and model pretraining
21
 
 
79
 
80
  However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
81
 
82
+ ## Cite
83
+ If you use our models, please cite our latest preprint:
84
+
85
+ ```bibtex
86
+
87
+ @misc{carrino2021biomedical,
88
+ title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
89
+ author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
90
+ year={2021},
91
+ eprint={2109.03570},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.CL}
94
+ }
95
+
96
+ ```
97
+
98
+ If you use our Medical Crawler corpus, please cite the preprint:
99
+
100
+ ```bibtex
101
+
102
+ @misc{carrino2021spanish,
103
+ title={Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models},
104
+ author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Ona de Gibert Bonet and Asier Gutiérrez-Fandiño and Aitor Gonzalez-Agirre and Martin Krallinger and Marta Villegas},
105
+ year={2021},
106
+ eprint={2109.07765},
107
+ archivePrefix={arXiv},
108
+ primaryClass={cs.CL}
109
+ }
110
+
111
+ ```
112
+
113
  ---
114
 
115
  ## How to use