Update README.md
Browse files
README.md
CHANGED
@@ -6,44 +6,48 @@ language:
|
|
6 |
- fr
|
7 |
- it
|
8 |
widget:
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
library_name: transformers
|
14 |
pipeline_tag: text2text-generation
|
15 |
tags:
|
16 |
- medical
|
17 |
- multilingual
|
18 |
- medic
|
|
|
|
|
|
|
19 |
---
|
20 |
|
21 |
<p align="center">
|
22 |
<br>
|
23 |
<img src="http://www.ixa.eus/sites/default/files/anitdote.png" style="height: 250px;">
|
24 |
-
<h2 align="center">Medical mT5: An Open-Source Multilingual Text-to-Text LLM
|
|
|
25 |
<br>
|
26 |
|
27 |
|
28 |
-
# Model Card for
|
29 |
|
30 |
<p align="justify">
|
31 |
-
We present Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
|
32 |
-
developed by continuing the training of publicly available mT5 checkpoints on
|
|
|
33 |
</p>
|
34 |
|
35 |
-
- 📖 Paper: [Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain]()
|
36 |
- 🌐 Project Website: [https://univ-cotedazur.eu/antidote](https://univ-cotedazur.eu/antidote)
|
37 |
|
38 |
|
39 |
-
|
40 |
<table border="1" cellspacing="0" cellpadding="5">
|
41 |
-
<caption>Pre-Training settings for
|
42 |
<thead>
|
43 |
<tr>
|
44 |
<th></th>
|
45 |
<th>Medical mT5-Large (<a href="https://huggingface.co/HiTZ/Medical-mT5-large">HiTZ/Medical-mT5-large</a>)</th>
|
46 |
-
<th>
|
47 |
</tr>
|
48 |
</thead>
|
49 |
<tbody>
|
@@ -128,16 +132,16 @@ tokenizer = AutoTokenizer.from_pretrained("HiTZ/Medical-mT5-xl")
|
|
128 |
model = AutoModelForSeq2SeqLM.from_pretrained("HiTZ/Medical-mT5-xl")
|
129 |
```
|
130 |
|
131 |
-
The model has been trained using the T5 masked language
|
132 |
|
133 |
<p align="center">
|
134 |
<br>
|
135 |
<img src="https://miro.medium.com/v2/0*yeXSc6Qs-SGKDzZP.png" style="height: 250px;">
|
136 |
<br>
|
137 |
|
138 |
-
### Medical mT5 for Sequence Labelling
|
139 |
|
140 |
-
|
|
|
141 |
|
142 |
## Training Data
|
143 |
|
@@ -267,10 +271,42 @@ If you want to use Medical mT5 for Sequence Labeling, we recommend you use this
|
|
267 |
</tbody>
|
268 |
</table>
|
269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
270 |
|
271 |
|
272 |
|
273 |
-
## Evaluation
|
274 |
|
275 |
### Single-task supervised F1 scores for Sequence Labelling
|
276 |
<p align="center">
|
@@ -293,11 +329,21 @@ If you want to use Medical mT5 for Sequence Labeling, we recommend you use this
|
|
293 |
|
294 |
## Ethical Statement
|
295 |
<p align="justify">
|
296 |
-
Our research in developing Medical mT5, a multilingual text-to-text model for the medical domain, has ethical implications that we acknowledge.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
297 |
</p>
|
298 |
|
299 |
## Citation
|
300 |
|
|
|
|
|
301 |
```bibtext
|
302 |
@inproceedings{medical-mt5,
|
303 |
title = "{{Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain}}",
|
|
|
6 |
- fr
|
7 |
- it
|
8 |
widget:
|
9 |
+
- text: The best cough medicine is <extra_id_0> because <extra_id_1>
|
10 |
+
- text: El mejor medicamento para la tos es <extra_id_0> porque <extra_id_1>
|
11 |
+
- text: Le meilleur médicament contre la toux est <extra_id_0> car <extra_id_1
|
12 |
+
- text: La migliore medicina per la tosse è la <extra_id_0> perché la <extra_id_1
|
13 |
library_name: transformers
|
14 |
pipeline_tag: text2text-generation
|
15 |
tags:
|
16 |
- medical
|
17 |
- multilingual
|
18 |
- medic
|
19 |
+
datasets:
|
20 |
+
- HiTZ/Multilingual-Medical-Corpus
|
21 |
+
base_model: google/mt5-xl
|
22 |
---
|
23 |
|
24 |
<p align="center">
|
25 |
<br>
|
26 |
<img src="http://www.ixa.eus/sites/default/files/anitdote.png" style="height: 250px;">
|
27 |
+
<h2 align="center">Medical mT5: An Open-Source Multilingual Text-to-Text LLM
|
28 |
+
for the Medical Domain</h2>
|
29 |
<br>
|
30 |
|
31 |
|
32 |
+
# Model Card for MedMT5-xl
|
33 |
|
34 |
<p align="justify">
|
35 |
+
We present Medical mT5, the first open-source text-to-text multilingual model for the medical domain.
|
36 |
+
Medical mT5 is an encoder-decoder model developed by continuing the training of publicly available mT5 checkpoints on
|
37 |
+
medical domain data for English, Spanish, French, and Italian.
|
38 |
</p>
|
39 |
|
40 |
+
- 📖 Paper: [Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain]()
|
41 |
- 🌐 Project Website: [https://univ-cotedazur.eu/antidote](https://univ-cotedazur.eu/antidote)
|
42 |
|
43 |
|
|
|
44 |
<table border="1" cellspacing="0" cellpadding="5">
|
45 |
+
<caption>Pre-Training settings for MedMT5.</caption>
|
46 |
<thead>
|
47 |
<tr>
|
48 |
<th></th>
|
49 |
<th>Medical mT5-Large (<a href="https://huggingface.co/HiTZ/Medical-mT5-large">HiTZ/Medical-mT5-large</a>)</th>
|
50 |
+
<th>Medical mT5-XL (<a href="https://huggingface.co/HiTZ/Medical-mT5-xl">HiTZ/Medical-mT5-xl</a>)</th>
|
51 |
</tr>
|
52 |
</thead>
|
53 |
<tbody>
|
|
|
132 |
model = AutoModelForSeq2SeqLM.from_pretrained("HiTZ/Medical-mT5-xl")
|
133 |
```
|
134 |
|
135 |
+
The model has been trained using the T5 masked language modelling tasks. You need to finetune the model for your task.
|
136 |
|
137 |
<p align="center">
|
138 |
<br>
|
139 |
<img src="https://miro.medium.com/v2/0*yeXSc6Qs-SGKDzZP.png" style="height: 250px;">
|
140 |
<br>
|
141 |
|
|
|
142 |
|
143 |
+
|
144 |
+
|
145 |
|
146 |
## Training Data
|
147 |
|
|
|
271 |
</tbody>
|
272 |
</table>
|
273 |
|
274 |
+
## Evaluation
|
275 |
+
|
276 |
+
### Medical mT5 for Sequence Labelling
|
277 |
+
|
278 |
+
We have released two Medical mT5 models finetuned for multilingual sequence labelling.
|
279 |
+
<table border="1" cellspacing="0" cellpadding="5">
|
280 |
+
<thead>
|
281 |
+
<tr>
|
282 |
+
<th></th>
|
283 |
+
<th><a href="https://huggingface.co/HiTZ/Medical-mT5-large">HiTZ/Medical-mT5-large</a></th>
|
284 |
+
<th><a href="https://huggingface.co/HiTZ/Medical-mT5-xl">HiTZ/Medical-mT5-xl</a></th>
|
285 |
+
<th><a href="https://huggingface.co/HiTZ/Medical-mT5-large-multitask">HiTZ/Medical-mT5-large-multitask</a></th>
|
286 |
+
<th><a href="https://huggingface.co/HiTZ/Medical-mT5-xl-multitask">HiTZ/Medical-mT5-xl-multitask</a></th>
|
287 |
+
</tr>
|
288 |
+
</thead>
|
289 |
+
<tbody>
|
290 |
+
<tr>
|
291 |
+
<td>Param. no.</td>
|
292 |
+
<td>738M</td>
|
293 |
+
<td>3B</td>
|
294 |
+
<td>738M</td>
|
295 |
+
<td>3B</td>
|
296 |
+
</tr>
|
297 |
+
<tr>
|
298 |
+
<td>Task</td>
|
299 |
+
<td>Language Modeling</td>
|
300 |
+
<td>Language Modeling</td>
|
301 |
+
<td>Multitask Sequence Labeling</td>
|
302 |
+
<td>Multitask Sequence Labeling</td>
|
303 |
+
</tr>
|
304 |
+
<tr>
|
305 |
+
</tbody>
|
306 |
+
</table>
|
307 |
|
308 |
|
309 |
|
|
|
310 |
|
311 |
### Single-task supervised F1 scores for Sequence Labelling
|
312 |
<p align="center">
|
|
|
329 |
|
330 |
## Ethical Statement
|
331 |
<p align="justify">
|
332 |
+
Our research in developing Medical mT5, a multilingual text-to-text model for the medical domain, has ethical implications that we acknowledge.
|
333 |
+
Firstly, the broader impact of this work lies in its potential to improve medical communication and understanding across languages, which
|
334 |
+
can enhance healthcare access and quality for diverse linguistic communities. However, it also raises ethical considerations related to privacy and data security.
|
335 |
+
To create our multilingual corpus, we have taken measures to anonymize and protect sensitive patient information, adhering to
|
336 |
+
data protection regulations in each language's jurisdiction or deriving our data from sources that explicitly address this issue in line with
|
337 |
+
privacy and safety regulations and guidelines. Furthermore, we are committed to transparency and fairness in our model's development and evaluation.
|
338 |
+
We have worked to ensure that our benchmarks are representative and unbiased, and we will continue to monitor and address any potential biases in the future.
|
339 |
+
Finally, we emphasize our commitment to open source by making our data, code, and models publicly available, with the aim of promoting collaboration within
|
340 |
+
the research community.
|
341 |
</p>
|
342 |
|
343 |
## Citation
|
344 |
|
345 |
+
We will soon release a paper, but, for now, you can use:
|
346 |
+
|
347 |
```bibtext
|
348 |
@inproceedings{medical-mt5,
|
349 |
title = "{{Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain}}",
|