GPT-2 Fine-Tuned with Direct Preference Optimization (DPO)

This model is a fine-tuned version of GPT-2, trained using Direct Preference Optimization (DPO) on human preference data.

πŸ“Œ Model Details

πŸ“Œ How to Use the Model

You can use this model for text generation by loading it with the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "EkkaratT/a5_DPO_model"  # Replace with your actual model name
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Generate a response
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs, max_length=100)

# Decode and print the response
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
14
Safetensors
Model size
124M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train EkkaratT/a5_DPO_model