KennethTM commited on
Commit
6adc97a
1 Parent(s): 4c883e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -5,4 +5,46 @@ language:
5
  - da
6
  widget:
7
  - text: Der var engang
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - da
6
  widget:
7
  - text: Der var engang
8
+ ---
9
+
10
+ # What is this?
11
+
12
+ A GPT-2 model (small version, ~354 M parameters) for Danish text generation. The model was not pre-trained from scratch but adapted from the English version using [CLP-Transfer](https://arxiv.org/abs/2301.09626).
13
+
14
+ # How to use
15
+
16
+ Test the model using the pipeline from the [🤗 Transformers](https://github.com/huggingface/transformers) library:
17
+
18
+ ```python
19
+ from transformers import pipeline
20
+
21
+ generator = pipeline("text-generation", model = "KennethTM/gpt2-medium-danish")
22
+ text = generator("Manden arbejdede som")
23
+
24
+ print(text[0]["generated_text"])
25
+ ```
26
+
27
+ Or load it using the Auto* classes:
28
+
29
+ ```python
30
+ from transformers import AutoTokenizer, AutoModelForCausalLM
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained("KennethTM/gpt2-medium-danish")
33
+ model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-medium-danish")
34
+ ```
35
+
36
+ # Model training
37
+
38
+ The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
39
+
40
+ The model is initialized from the English [GPT-2 medium model](https://huggingface.co/gpt2-medium) ('source model') with new word token embeddings created from the Danish [GPT-2 small model](https://huggingface.co/KennethTM/gpt2-small-danish) ('helper model') using the [CLP-Transfer method](https://github.com/malteos/clp-transfer).
41
+
42
+ The whole model is trained using ~1.000.000 samples.
43
+
44
+ For reference, the model achieves a perplexity of 24.7 on 5.000 random validation samples.
45
+
46
+ The model is trained on an 8 GB GPU.
47
+
48
+ # Notes
49
+
50
+ This is a pre-trained model, for optimal performance it should be finetuned for new tasks.