outputs
This model is a fine-tuned version of microsoft/phi-2 using trl on ultrafeedback dataset.
What's new
A test for ORPO: Monolithic Preference Optimization without Reference Model method using trl library.
How to reproduce
accelerate launch --config_file=/path/to/trl/examples/accelerate_configs/deepspeed_zero2.yaml \
--num_processes 8 \
/path/to/trl/scripts/orpo.py \
--model_name_or_path="microsoft/phi-2" \
--per_device_train_batch_size 1 \
--max_steps 8000 \
--learning_rate 8e-5 \
--gradient_accumulation_steps 1 \
--logging_steps 20 \
--eval_steps 2000 \
--output_dir="orpo-lora-phi2" \
--optim rmsprop \
--warmup_steps 150 \
--bf16 \
--logging_first_step \
--no_remove_unused_columns \
--use_peft \
--lora_r=16 \
--lora_alpha=16 \
--dataset HuggingFaceH4/ultrafeedback_binarized
- Downloads last month
- 68
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Amu/orpo-lora-phi2
Base model
microsoft/phi-2