YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GPT-2 DPO Fine-Tuned Model
This repository contains a fine-tuned GPT-2 model trained using Direct Preference Optimization (DPO) on preference-based data.
Model Details
- Base Model: GPT-2
- Fine-tuned on: Preference optimization dataset
- Training Method: Direct Preference Optimization (DPO)
- Hyperparameters:
- Learning Rate:
1e-3
- Batch Size:
8
- Epochs:
5
- Beta:
0.1
- Learning Rate:
Dataset
The dataset used for training is Dahoas/static-hh
, a publicly available dataset on Hugging Face, designed for human preference optimization. It consists of multiple prompts along with corresponding chosen and rejected responses.
Usage
Load the model and tokenizer from Hugging Face:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "PhuePwint/dpo_gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate response
prompt = "What is the purpose of life?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.