stefan-it commited on
Commit
b895c3c
β€’
1 Parent(s): df93619

readme: add initial model

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - "historic french"
6
+ ---
7
+ # πŸ€— + πŸ“š dbmdz BERT model
8
+
9
+ In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
10
+ Library open sources French Europeana BERT models πŸŽ‰
11
+
12
+ # French Europeana BERT
13
+
14
+ We extracted all French texts using the `language` metadata attribute from the Europeana corpus.
15
+
16
+ The resulting corpus has a size of 63GB and consists of 11,052,528,456 tokens.
17
+
18
+ Based on the metadata information, texts from the 18th - 20th century are mainly included in the
19
+ training corpus.
20
+
21
+ Detailed information about the data and pretraining steps can be found in
22
+ [this repository](https://github.com/stefan-it/europeana-bert).
23
+
24
+ ## Model weights
25
+
26
+ BERT model weights for PyTorch and TensorFlow are available.
27
+
28
+ * French Europeana BERT: `dbmdz/bert-base-french-europeana-cased` - [model hub page](https://huggingface.co/dbmdz/bert-base-french-europeana-cased/tree/main)
29
+
30
+ ## Results
31
+
32
+ For results on Historic NER, please refer to [this repository](https://github.com/stefan-it/europeana-bert).
33
+
34
+ ## Usage
35
+
36
+ With Transformers >= 2.3 our French Europeana BERT model can be loaded like:
37
+
38
+ ```python
39
+ from transformers import AutoModel, AutoTokenizer
40
+ tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-french-europeana-cased")
41
+ model = AutoModel.from_pretrained("dbmdz/bert-base-french-europeana-cased")
42
+ ```
43
+
44
+ # Huggingface model hub
45
+
46
+ All models are available on the [Huggingface model hub](https://huggingface.co/dbmdz).
47
+
48
+ # Contact (Bugs, Feedback, Contribution and more)
49
+
50
+ For questions about our BERT model just open an issue
51
+ [here](https://github.com/dbmdz/berts/issues/new) πŸ€—
52
+
53
+ # Acknowledgments
54
+
55
+ Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
56
+ Thanks for providing access to the TFRC ❀️
57
+
58
+ Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
59
+ it is possible to download our model from their S3 storage πŸ€—