File size: 547 Bytes
bea3712 fee98bc bea3712 13cdf6b bea3712 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
---
license: cc-by-nc-4.0
datasets:
- pankajmathur/orca_mini_v1_dataset
- openai/summarize_from_feedback
- PygmalionAI/PIPPA
- chargoddard/rpguild
- lemonilia/LimaRP
- PKU-Alignment/PKU-SafeRLHF
- Intel/orca_dpo_pairs
- allenai/ultrafeedback_binarized_cleaned
---
Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets.
Several intermediate checkpoints (of cDPO training) are on branches.
Uses the Alpaca prompt format. |