File size: 2,485 Bytes
7b06699
 
 
 
 
e6e5de9
7b06699
e6e5de9
7b06699
e6e5de9
7b06699
043d15c
 
 
 
 
 
 
2f67687
 
 
 
 
 
 
 
 
 
52e3c32
 
 
 
2d6ce68
52e3c32
 
 
 
 
 
 
2d6ce68
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
base_model: deepset/gbert-base
---

# GBERT-BioM-Translation-base

This model is a medically continuously pre-trained version of [deepset/gbert-base](https://huggingface.co/deepset/gbert-base).

## Training data

The model was trained on German PubMed abstracts, translated English PubMed abstracts, and translated MIMIC-III reports.

| Dataset    | Tokens   | Documents |
|------------|----------|-----------|
| German PubMed | 5M      | 16K       |
| PubMed     | 1,700M   | 21M       |
| MIMIC-III  | 695M     | 24M       |
| **Total**  | **2,400M** | **45M**     |

## Evaluation

| Model                        | CLEF eHealth 2019 |      |      | RadQA |      | GraSCCo |      |      | BRONCO150 |      |      | GGPONC 2.0 |      |      |
|------------------------------|-------------------|------|------|-------|------|---------|------|------|-----------|------|------|------------|------|------|
|                              | F1                | P    | R    | F1    | EM   | F1      | P    | R    | F1        | P    | R    | F1         | P    | R    |
| [GBERT-base](https://huggingface.co/deepset/gbert-base)                   | .816              | .818 | .815 | .794  | .707 | .642    | .617 | .676 | .833      | .818 | .849 | .770       | .761 | .780 |
| [GBERT-large](https://huggingface.co/deepset/gbert-large)                  | .832              | .802 | .865 | .809  | .718 | .647    | .617 | .680 | .835      | .820 | .852 | .772       | .758 | .786 |
| **GBERT-BioM-Translation-base**  | .825              | .851 | .801 | .808  | .716 | .661    | .642 | .681 | .842      | .824 | .861 | .780       | .766 | .794 |
| GBERT-BioM-Translation-large | .833              | .860 | .807 | .811  | .714 | .692    | .677 | .707 | .844      | .825 | .864 | .786       | .779 | .793 |

## Publication

```bibtex
@misc{idrissiyaghir2024comprehensive,
      title={Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding}, 
      author={Ahmad Idrissi-Yaghir and Amin Dada and Henning Schäfer and Kamyar Arzideh and Giulia Baldini and Jan Trienes and Max Hasin and Jeanette Bewersdorff and Cynthia S. Schmidt and Marie Bauer and Kaleb E. Smith and Jiang Bian and Yonghui Wu and Jörg Schlötterer and Torsten Zesch and Peter A. Horn and Christin Seifert and Felix Nensa and Jens Kleesiek and Christoph M. Friedrich},
      year={2024},
      eprint={2404.05694},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```