Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
lblaoke
/
qwama-0.5b-hh-rlhf-dpo-trl-v4
like
0
Safetensors
Dahoas/full-hh-rlhf
qwen2
Model card
Files
Files and versions
Community
num_train_epochs: 1
learning_rate: 1e-3
total_batch_size: 16
Downloads last month
16
Safetensors
Model size
473M params
Tensor type
BF16
·
Chat template
Files info
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for
lblaoke/qwama-0.5b-hh-rlhf-dpo-trl-v4
Base model
turboderp/Qwama-0.5B-Instruct
Finetuned
lblaoke/qwama-0.5b-hh-rlhf-sft-chosen-trl-v4
Finetuned
(
1
)
this model
Dataset used to train
lblaoke/qwama-0.5b-hh-rlhf-dpo-trl-v4
Dahoas/full-hh-rlhf
Viewer
•
Updated
Feb 23, 2023
•
125k
•
2.04k
•
81
Collection including
lblaoke/qwama-0.5b-hh-rlhf-dpo-trl-v4
Draft Models
Collection
9 items
•
Updated
10 days ago