KennethTM commited on
Commit
b87a7d4
1 Parent(s): e91704f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -37,10 +37,13 @@ model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-small-danish")
37
 
38
  The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
39
 
40
- The model is initialized from the English [GPT-2 small model](https://huggingface.co/gpt2) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).
41
 
42
  Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
43
 
 
 
 
44
  Model training is carried out on an 8 GB GPU.
45
 
46
  # Notes
 
37
 
38
  The model is trained using the Danish part of the [oscar dataset](https://huggingface.co/datasets/oscar) ('unshuffled_deduplicated_da') and a context length of 1024 tokens.
39
 
40
+ The model weights are initialized from the English [GPT-2 small model](https://huggingface.co/gpt2) with new word token embeddings created for Danish using [WECHSEL](https://github.com/CPJKU/wechsel).
41
 
42
  Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.
43
 
44
+ For reference, the model achieves a perplexity of 33.5 on 5.000 random validation samples.
45
+
46
+
47
  Model training is carried out on an 8 GB GPU.
48
 
49
  # Notes