GPT-2 Fine-Tuned with Direct Preference Optimization (DPO)
This model is a fine-tuned version of GPT-2, trained using Direct Preference Optimization (DPO) on human preference data.
π Model Details
- Base Model: GPT-2
- Training Method: Direct Preference Optimization (DPO)
- Dataset Used: argilla/ultrafeedback-binarized-preferences-cleaned
- Fine-Tuned For: Human preference ranking tasks
π How to Use the Model
You can use this model for text generation by loading it with the transformers
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "EkkaratT/a5_DPO_model" # Replace with your actual model name
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Generate a response
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs, max_length=100)
# Decode and print the response
print(tokenizer.decode(output[0], skip_special_tokens=True))
- Downloads last month
- 14
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.