1
---
2
language: de
3
license: mit
4
tags:
5
  - "historic german"
6
---
7
8
# πŸ€— + πŸ“š dbmdz BERT models
9
10
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
11
Library open sources German Europeana BERT models πŸŽ‰
12
13
# German Europeana BERT
14
15
We use the open source [Europeana newspapers](http://www.europeana-newspapers.eu/)
16
that were provided by *The European Library*. The final
17
training corpus has a size of 51GB and consists of 8,035,986,369 tokens.
18
19
Detailed information about the data and pretraining steps can be found in
20
[this repository](https://github.com/stefan-it/europeana-bert).
21
22
## Model weights
23
24
Currently only PyTorch-[Transformers](https://github.com/huggingface/transformers)
25
compatible weights are available. If you need access to TensorFlow checkpoints,
26
please raise an issue!
27
28
| Model                                      | Downloads
29
| ------------------------------------------ | ---------------------------------------------------------------------------------------------------------------
30
| `dbmdz/bert-base-german-europeana-cased`   | [`config.json`](https://cdn.huggingface.co/dbmdz/bert-base-german-europeana-cased/config.json)   β€’ [`pytorch_model.bin`](https://cdn.huggingface.co/dbmdz/bert-base-german-europeana-cased/pytorch_model.bin)   β€’ [`vocab.txt`](https://cdn.huggingface.co/dbmdz/bert-base-german-europeana-cased/vocab.txt)
31
32
## Results
33
34
For results on Historic NER, please refer to [this repository](https://github.com/stefan-it/europeana-bert).
35
36
## Usage
37
38
With Transformers >= 2.3 our German Europeana BERT models can be loaded like:
39
40
```python
41
from transformers import AutoModel, AutoTokenizer
42
43
tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-europeana-cased")
44
model = AutoModel.from_pretrained("dbmdz/bert-base-german-europeana-cased")
45
```
46
47
# Huggingface model hub
48
49
All models are available on the [Huggingface model hub](https://huggingface.co/dbmdz).
50
51
# Contact (Bugs, Feedback, Contribution and more)
52
53
For questions about our BERT models just open an issue
54
[here](https://github.com/dbmdz/berts/issues/new) πŸ€—
55
56
# Acknowledgments
57
58
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
59
Thanks for providing access to the TFRC ❀️
60
61
Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
62
it is possible to download both cased and uncased models from their S3 storage πŸ€—
63