Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
XueyingJia
/
pythia-1b-online-dpo-HH-merge-rewardmodel
like
0
Transformers
Safetensors
XueyingJia/online_dpo_repo
Generated from Trainer
trl
online-dpo
Inference Endpoints
arxiv:
2402.04792
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
pythia-1b-online-dpo-HH-merge-rewardmodel
/
adapter_model.safetensors
Commit History
Training in progress, step 4020
f0a9b9e
verified
XueyingJia
commited on
19 days ago
Training in progress, step 3618
1d28c06
verified
XueyingJia
commited on
19 days ago
Training in progress, step 3216
303a0aa
verified
XueyingJia
commited on
19 days ago
Training in progress, step 2814
62fba89
verified
XueyingJia
commited on
19 days ago
Training in progress, step 2412
11821bc
verified
XueyingJia
commited on
19 days ago
Training in progress, step 2010
3d48c1a
verified
XueyingJia
commited on
19 days ago
Training in progress, step 1608
bd548b9
verified
XueyingJia
commited on
19 days ago
Training in progress, step 1206
050234a
verified
XueyingJia
commited on
19 days ago
Training in progress, step 804
d0b051f
verified
XueyingJia
commited on
19 days ago
Training in progress, step 402
bce8c05
verified
XueyingJia
commited on
19 days ago