about data

by timelogger - opened

Can I get information about the data used for fine-tuning? Is that data open source?

Lotte Innovate org

In the instruction tuning, we used the Open-Orca/SlimOrca dataset after applying dedup and sampling. Similarly, in the DPO tuning, we used the Intel/orca_dpo_pairs dataset after applying dedup and sampling.

Then, did you not use a Korean dataset for this LDCC-SOLAR-10.7B?

Lotte Innovate org

During the instruction tuning phase, we utilized data that had been translated. However, for the DPO tuning, we used the data in its original, untranslated form.

Thanks a lot :)

timelogger changed discussion status to closed

Sign up or log in to comment