afrideva's picture
Upload README.md with huggingface_hub
7e7ca0a
metadata
base_model: KennethTM/gpt2-small-danish
datasets:
  - oscar
inference: false
language:
  - da
model_creator: KennethTM
model_name: gpt2-small-danish
pipeline_tag: text-generation
quantized_by: afrideva
tags:
  - gguf
  - ggml
  - quantized
  - q2_k
  - q3_k_m
  - q4_k_m
  - q5_k_m
  - q6_k
  - q8_0
widget:
  - text: Der var engang

KennethTM/gpt2-small-danish-GGUF

Quantized GGUF model files for gpt2-small-danish from KennethTM

Name Quant method Size
gpt2-small-danish.fp16.gguf fp16 328.21 MB
gpt2-small-danish.q2_k.gguf q2_k 81.30 MB
gpt2-small-danish.q3_k_m.gguf q3_k_m 95.56 MB
gpt2-small-danish.q4_k_m.gguf q4_k_m 110.27 MB
gpt2-small-danish.q5_k_m.gguf q5_k_m 124.20 MB
gpt2-small-danish.q6_k.gguf q6_k 136.02 MB
gpt2-small-danish.q8_0.gguf q8_0 175.47 MB

Original Model Card:

What is this?

A GPT-2 model (small version, 124 M parameters) for Danish text generation. The model was not pre-trained from scratch but adapted from the English version.

How to use

Test the model using the pipeline from the 🤗 Transformers library:

from transformers import pipeline

generator = pipeline("text-generation", model = "KennethTM/gpt2-small-danish")
text = generator("Manden arbejdede som")

print(text[0]["generated_text"])

Or load it using the Auto* classes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("KennethTM/gpt2-small-danish")
model = AutoModelForCausalLM.from_pretrained("KennethTM/gpt2-small-danish")

Model training

The model is trained using the Danish part of the oscar dataset ('unshuffled_deduplicated_da') and a context length of 1024 tokens.

The model weights are initialized from the English GPT-2 small model with new word token embeddings created for Danish using WECHSEL.

Initially, only the word token embeddings are trained using 50.000 samples. Finally, the whole model is trained using 1.000.000 samples.

For reference, the model achieves a perplexity of 33.5 on 5.000 random validation samples.

Model training is carried out on an 8 GB GPU.

Notes

This is a pre-trained model, for optimal performance it should be finetuned for new tasks.