stefan-it commited on
Commit
d7e6758
1 Parent(s): 5611a7d

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: de
3
+
4
+ widget:
5
+ - text: "Schon um die Liebe"
6
+
7
+ license: mit
8
+ ---
9
+
10
+ # German GPT-2 model
11
+
12
+ In this repository we release (yet another) GPT-2 model, that was trained on various texts for German.
13
+
14
+ The model is meant to be an entry point for fine-tuning on other texts, and it is definitely not as good or "dangerous" as the English GPT-3 model. We do not plan extensive PR or staged releases for this model 😉
15
+
16
+ **Note**: The model was initially released under an anonymous alias (`anonymous-german-nlp/german-gpt2`) so we now "de-anonymize" it.
17
+
18
+ More details about GPT-2 can be found in the great [Hugging Face](https://huggingface.co/transformers/model_doc/gpt2.html) documentation.
19
+
20
+ ## German GPT-2 fine-tuned on Faust Faust I and II
21
+
22
+ We fine-tuned our German GPT-2 model on "Faust I and II" from Johann Wolfgang Goethe. These texts can be obtained from [Deutsches Textarchiv (DTA)](http://www.deutschestextarchiv.de/book/show/goethe_faust01_1808). We use the "normalized" version of both texts (to avoid out-of-vocabulary problems with e.g. "ſ")
23
+
24
+ Fine-Tuning was done for 100 epochs, using a batch size of 4 with half precision on a RTX 3090. Total time was around 12 minutes (it is really fast!).
25
+
26
+ We also open source this fine-tuned model. Text can be generated with:
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+
31
+ pipe = pipeline('text-generation', model="dbmdz/german-gpt2-faust",
32
+ tokenizer="dbmdz/german-gpt2-faust", config={'max_length':800})
33
+
34
+ text = pipe2("Schon um die Liebe")[0]["generated_text"]
35
+
36
+ print(text)
37
+ ```
38
+
39
+ and could output:
40
+
41
+ ```
42
+ Schon um die Liebe bitte ich, Herr! Wer mag sich die dreifach Ermächtigen?
43
+ Sei mir ein Held!
44
+ Und daß die Stunde kommt spreche ich nicht aus.
45
+ Faust (schaudernd).
46
+ Den schönen Boten finde' ich verwirrend;
47
+ ```
48
+
49
+ # License
50
+
51
+ All models are licensed under [MIT](LICENSE).
52
+
53
+ # Huggingface model hub
54
+
55
+ All models are available on the [Huggingface model hub](https://huggingface.co/dbmdz).
56
+
57
+ # Contact (Bugs, Feedback, Contribution and more)
58
+
59
+ For questions about our BERT models just open an issue
60
+ [here](https://github.com/stefan-it/german-gpt/issues/new) 🤗
61
+
62
+ # Acknowledgments
63
+
64
+ Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
65
+ Thanks for providing access to the TFRC ❤️
66
+
67
+ Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
68
+ it is possible to download both cased and uncased models from their S3 storage 🤗