Edit model card

GPT-2 finetuned on German Dataset

Tokenizer

We first trained a tokenizer on OSCAR's unshuffled_original_de German data subset by following the training of GPT2 tokenizer (same vocab size of 50,257). Here's the Python file for the training.

Model

We finetuned the wte and wpe layers of GPT-2 (while freezing the parameters of all other layers) on OSCAR's unshuffled_original_de German data subset. We used Huggingface's code for fine-tuning the causal language model GPT-2, but with the following parameters changed

- preprocessing_num_workers: 8
- per_device_train_batch_size: 2
- gradient_accumulation_steps: 4
- per_device_eval_batch_size: 2
- eval_accumulation_steps: 4
- eval_steps: 1000 
- evaluation_strategy: "steps"
- max_eval_samples: 5000

Training details: total training steps: 457000, effective train batch size per step: 32, max tokens per batch: 1024)

Final checkpoint: checkpoint-457000

Downloads last month
24
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train yongzx/gpt2-finetuned-oscar-de