YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 DPO Fine-Tuned Model

This repository contains a fine-tuned GPT-2 model trained using Direct Preference Optimization (DPO) on preference-based data.

Model Details

  • Base Model: GPT-2
  • Fine-tuned on: Preference optimization dataset
  • Training Method: Direct Preference Optimization (DPO)
  • Hyperparameters:
    • Learning Rate: 1e-3
    • Batch Size: 8
    • Epochs: 5
    • Beta: 0.1

Dataset

The dataset used for training is Dahoas/static-hh, a publicly available dataset on Hugging Face, designed for human preference optimization. It consists of multiple prompts along with corresponding chosen and rejected responses.

Usage

Load the model and tokenizer from Hugging Face:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "PhuePwint/dpo_gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate response
prompt = "What is the purpose of life?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
9
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.