2-TPG
2-TPG is GPT-2, but fine-tuned on its own output to predict the previous token rather than the next one.
Model description
2-TPG is a transformers model based on GPT-2 (117M parameters) that has been fine-tuned to predict previous tokens instead of next tokens. While traditional language models are trained to guess the next word in a sequence, 2-TPG does the opposite - it predicts what came before.
This was accomplished by:
- Taking sample text from GPT-2's typical output distribution
- Tokenizing the text
- Reversing the token sequence
- Fine-tuning GPT-2 on these reversed sequences
The result is a model that has learned to "think backwards" - given a sequence like "The end of the story", it can generate what might have come before, rather than what might come after.
Evaluation shows that 2-TPG achieves a perplexity of 14.04 on reverse prediction tasks, compared to 9.05 for standard GPT-2 in the forward prediction task.
Intended uses & limitations
This model can be used for:
- Generating text that leads up to a specific ending
- Exploring the "causes" that might lead to specific "effects" in text
- Probing language model understanding from a new direction
- Understanding how language models learn bidirectional dependencies
As with all language models, outputs should be treated as experimental and may contain biases present in the training data.
How to use
You can use this model directly for reversed text generation:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
# Load the model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("drwahl/2tpg")
model = GPT2LMHeadModel.from_pretrained("drwahl/2tpg")
# Function for reverse text generation
def generate_what_came_before(prompt, max_length=50):
# Tokenize the prompt
tokens = tokenizer.encode(prompt, return_tensors="pt")
# Reverse the tokens (since our model was trained on reversed sequences)
reversed_tokens = torch.flip(tokens, dims=[1])
# Generate text with the model
output = model.generate(
reversed_tokens,
max_length=reversed_tokens.shape[1] + max_length,
do_sample=True,
temperature=1.2,
top_k=40,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id
)
# Reverse the output tokens back to normal order
generated_tokens = torch.flip(output[0], dims=[0])[:max_length].cpu()
# Decode the generated tokens
generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return generated_text
# Example usage
ending = "And they lived happily ever after."
beginning = generate_what_came_before(ending)
print(f"Generated beginning: {beginning}")
print(f"Ending: {ending}")
Training data
The model was fine-tuned on a dataset derived from GPT-2's typical output distribution. The training process involved:
- Generating text samples from the base GPT-2 model
- Reversing the token sequences
- Fine-tuning the model to predict these reversed sequences
Training procedure
The model was trained using a standard language modeling objective, but on reversed sequences. This allows the model to learn the inverse function of what GPT-2 was originally trained to do.
Preprocessing
Texts were tokenized using GPT-2's byte-level BPE tokenizer, then the token sequences were reversed before training.
Evaluation results
The model was evaluated on a validation set consisting of 5,000 examples. Here's how it compares to the base GPT-2 model:
Model | Direction | Perplexity |
---|---|---|
2-TPG | Reverse | 14.04 |
2-TPG | Forward | 705.52 |
GPT-2 | Reverse | 284.11 |
GPT-2 | Forward | 9.05 |
These results show that:
- 2-TPG is 20x better than standard GPT-2 at predicting previous tokens
- 2-TPG has specialized for the reverse prediction task, as its forward prediction performance is significantly worse than standard GPT-2
- Each model performs best at the task it was trained for
Citation
If you use this model in your research, please cite:
@misc{wahl2025reversed,
title={2-TPG: Reversing the Direction of Language Model Prediction},
author={Wahl, Daniel},
year={2025}
}
- Downloads last month
- 2