mmarimon commited on
Commit
89ad82d
1 Parent(s): c09f685

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -16
README.md CHANGED
@@ -33,19 +33,66 @@ widget:
33
 
34
 
35
  # Spanish RoBERTa-base biomedical model finetuned for the Named Entity Recognition (NER) task on the Cantemist dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  A fine-tuned version of the [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish biomedical corpus known to date, composed of biomedical documents, clinical cases and EHR documents for a total of 1.1B tokens of clean and deduplicated text processed.
37
 
38
  For more details about the corpora and training, check the _bsc-bio-ehr-es_ model card.
39
 
40
- ## Dataset
 
 
 
 
 
 
 
41
  The dataset used is [CANTEMIST](https://huggingface.co/datasets/PlanTL-GOB-ES/cantemist-ner), a NER dataset annotated with tumor morphology entities. For further information, check the [official website](https://temu.bsc.es/cantemist/).
42
 
43
- ## Evaluation and results
44
  F1 Score: 0.8340
45
 
46
  For evaluation details visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-biomedical-clinical-es).
47
 
48
- ## Citing
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  If you use these models, please cite our work:
50
 
51
  ```bibtext
@@ -72,19 +119,7 @@ If you use these models, please cite our work:
72
  }
73
  ```
74
 
75
- ## Copyright
76
-
77
- Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
78
-
79
- ## Licensing information
80
-
81
- [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
82
-
83
- ## Funding
84
-
85
- This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL.
86
-
87
- ## Disclaimer
88
 
89
  The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
90
 
 
33
 
34
 
35
  # Spanish RoBERTa-base biomedical model finetuned for the Named Entity Recognition (NER) task on the Cantemist dataset.
36
+
37
+ ## Table of contents
38
+ <details>
39
+ <summary>Click to expand</summary>
40
+
41
+ - [Model description](#model-description)
42
+ - [Intended uses and limitations](#intended-use)
43
+ - [How to use](#how-to-use)
44
+ - [Limitations and bias](#limitations-and-bias)
45
+ - [Training](#training)
46
+ - [Evaluation](#evaluation)
47
+ - [Additional information](#additional-information)
48
+ - [Author](#author)
49
+ - [Contact information](#contact-information)
50
+ - [Copyright](#copyright)
51
+ - [Licensing information](#licensing-information)
52
+ - [Funding](#funding)
53
+ - [Citing information](#citing-information)
54
+ - [Disclaimer](#disclaimer)
55
+
56
+ </details>
57
+
58
+ ## Model description
59
  A fine-tuned version of the [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model and has been pre-trained using the largest Spanish biomedical corpus known to date, composed of biomedical documents, clinical cases and EHR documents for a total of 1.1B tokens of clean and deduplicated text processed.
60
 
61
  For more details about the corpora and training, check the _bsc-bio-ehr-es_ model card.
62
 
63
+ ## Intended uses and limitations
64
+
65
+ ## How to use
66
+
67
+ ## Limitations and bias
68
+ At the time of submission, no measures have been taken to estimate the bias embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
69
+
70
+ ## Training
71
  The dataset used is [CANTEMIST](https://huggingface.co/datasets/PlanTL-GOB-ES/cantemist-ner), a NER dataset annotated with tumor morphology entities. For further information, check the [official website](https://temu.bsc.es/cantemist/).
72
 
73
+ ## Evaluation
74
  F1 Score: 0.8340
75
 
76
  For evaluation details visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-biomedical-clinical-es).
77
 
78
+ ## Additional information
79
+
80
+ ### Author
81
+ Text Mining Unit (TeMU) at the Barcelona Supercomputing Center (bsc-temu@bsc.es)
82
+
83
+ ### Contact information
84
+ For further information, send an email to <plantl-gob-es@bsc.es>
85
+
86
+ ### Copyright
87
+ Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
88
+
89
+ ### Licensing information
90
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
91
+
92
+ ### Funding
93
+ This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TL.
94
+
95
+ ### Citing information
96
  If you use these models, please cite our work:
97
 
98
  ```bibtext
 
119
  }
120
  ```
121
 
122
+ ### Disclaimer
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
125