Text Generation
Transformers
Safetensors
mistral
Inference Endpoints
text-generation-inference
Edit model card

Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.

Downloads last month
3,102
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Datasets used to train chargoddard/servile-harpsichord-cdpo