KTO offers an easier way to preference train LLMs (only ππ ratings are required). As part of #DataIsBetterTogether, I've written a tutorial on creating a preference dataset using Argilla and Spaces.
Using this approach, you can create a dataset that anyone with a Hugging Face account can contribute to π€―
π New tutorial covers: π¬ Generating responses with open models π₯ Collecting human feedback (do you like this model response? Yes/No) π€ Preparing a TRL-compatible dataset for training aligned models