Model description

paper: Characterizing Verbatim Short-Term Memory in Neural Language Models

This is a gpt2-small-like decoder-only transformer model trained on a 40M token subset of the wikitext-103 dataset.

Usage

You can download and load the model as follows:

from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained("Kristijan/gpt2_wt103-40m_12-layer")

Alternatively, if you've downloaded the checkpoint files in this repository, you could also do:

from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained(path_to_folder_with_checkpoint_files)

To tokenize your text for this model, you should use the tokenizer trained on Wikitext-103

Intended uses

This checkpoint is intended for research purposes, for example those interested in studying the behavior of transformer language models trained on smaller datasets.

Downloads last month
11
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Evaluation results