Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
tzwilliam0
/
maxmin-dpo-init-kl-coef-0.1-fix-lora-dongnan
like
0
Reinforcement Learning
Transformers
PyTorch
Safetensors
trl
ppo
Inference Endpoints
License:
apache-2.0
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
maxmin-dpo-init-kl-coef-0.1-fix-lora-dongnan
Commit History
new upload
74ec05f
v-guidongnan
commited on
22 days ago
Upload tokenizer
d3ae02f
verified
tzwilliam0
commited on
22 days ago
Upload model
6b2d152
verified
tzwilliam0
commited on
22 days ago
initial commit
9c4c33a
verified
tzwilliam0
commited on
22 days ago