Edit model card

gpt-2-finetuned-wikitext2

This model is a fine-tuned version of openai-community/gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3924

Model Description

This language model is built on the GPT-2 architecture provided by OpenAI. The tokenizer utilized for preprocessing text data is OpenAI's tikToken. For more details on tikToken, you can refer to the official GitHub repository.

Tokenizer Overview

To interactively explore the functionality and behavior of the tikToken tokenizer, you can use the tikToken interactive website. This website allows you to quickly visualize the tokenization process and understand how the tokenizer segments input text into tokens.

Model Checkpoint

The model checkpoint used in this implementation is sourced from the OpenAI community and is based on the GPT-2 architecture. You can find the specific model checkpoint at the following Hugging Face Model Hub link: openai-community/gpt2.

Training Details

The model was trained for a total of 3 epochs on the provided dataset. This information reflects the number of times the entire training dataset was processed during the training phase. Training for a specific number of epochs helps control the duration and scope of the model's learning process.

Training and evaluation data

Evaluation Data

For evaluating the model's performance, the training script utilized an evaluation dataset.

Evaluation Results

After training, the model's performance was assessed using the evaluation dataset. The perplexity, a common metric for language modeling tasks was Perplexity: 29.74

eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

>>> Perplexity : 29.74

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
3.4934 1.0 2334 3.4145
3.3567 2.0 4668 3.3953
3.2968 3.0 7002 3.3924

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
1
Safetensors
Model size
124M params
Tensor type
F32
·

Finetuned from