2-TPG

2-TPG is GPT-2, but fine-tuned on its own output to predict the previous token rather than the next one.

Model description

2-TPG is a transformers model based on GPT-2 (117M parameters) that has been fine-tuned to predict previous tokens instead of next tokens. While traditional language models are trained to guess the next word in a sequence, 2-TPG does the opposite - it predicts what came before.

This was accomplished by:

  1. Taking sample text from GPT-2's typical output distribution
  2. Tokenizing the text
  3. Reversing the token sequence
  4. Fine-tuning GPT-2 on these reversed sequences

The result is a model that has learned to "think backwards" - given a sequence like "The end of the story", it can generate what might have come before, rather than what might come after.

Evaluation shows that 2-TPG achieves a perplexity of 14.04 on reverse prediction tasks, compared to 9.05 for standard GPT-2 in the forward prediction task.

Intended uses & limitations

This model can be used for:

  • Generating text that leads up to a specific ending
  • Exploring the "causes" that might lead to specific "effects" in text
  • Probing language model understanding from a new direction
  • Understanding how language models learn bidirectional dependencies

As with all language models, outputs should be treated as experimental and may contain biases present in the training data.

How to use

You can use this model directly for reversed text generation:

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("drwahl/2tpg")
model = GPT2LMHeadModel.from_pretrained("drwahl/2tpg")

# Function for reverse text generation
def generate_what_came_before(prompt, max_length=50):
    # Tokenize the prompt
    tokens = tokenizer.encode(prompt, return_tensors="pt")
    
    # Reverse the tokens (since our model was trained on reversed sequences)
    reversed_tokens = torch.flip(tokens, dims=[1])
    
    # Generate text with the model
    output = model.generate(
        reversed_tokens,
        max_length=reversed_tokens.shape[1] + max_length,
        do_sample=True,
        temperature=1.2,
        top_k=40,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Reverse the output tokens back to normal order
    generated_tokens = torch.flip(output[0], dims=[0])[:max_length].cpu()
    
    # Decode the generated tokens
    generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
    
    return generated_text

# Example usage
ending = "And they lived happily ever after."
beginning = generate_what_came_before(ending)
print(f"Generated beginning: {beginning}")
print(f"Ending: {ending}")

Training data

The model was fine-tuned on a dataset derived from GPT-2's typical output distribution. The training process involved:

  1. Generating text samples from the base GPT-2 model
  2. Reversing the token sequences
  3. Fine-tuning the model to predict these reversed sequences

Training procedure

The model was trained using a standard language modeling objective, but on reversed sequences. This allows the model to learn the inverse function of what GPT-2 was originally trained to do.

Preprocessing

Texts were tokenized using GPT-2's byte-level BPE tokenizer, then the token sequences were reversed before training.

Evaluation results

The model was evaluated on a validation set consisting of 5,000 examples. Here's how it compares to the base GPT-2 model:

Model Direction Perplexity
2-TPG Reverse 14.04
2-TPG Forward 705.52
GPT-2 Reverse 284.11
GPT-2 Forward 9.05

These results show that:

  • 2-TPG is 20x better than standard GPT-2 at predicting previous tokens
  • 2-TPG has specialized for the reverse prediction task, as its forward prediction performance is significantly worse than standard GPT-2
  • Each model performs best at the task it was trained for

Citation

If you use this model in your research, please cite:

@misc{wahl2025reversed,
  title={2-TPG: Reversing the Direction of Language Model Prediction},
  author={Wahl, Daniel},
  year={2025}
}
Downloads last month
2
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support