SFT dataset

#1
by rahulrecdgp - opened

Hello, I find this model performing quite well.
Could you please share the dataset used for fine tuning, or some sample rows from the dataset.
thanks
Rahul

Owner

Hi Rahul,

The data for versions v1, v1.1, and v2 have slight differences.

For v1, the dataset primarily consists of Magpie, using multiple unfiltered versions to increase the volume of data.

In v1.1, I added two components to v1: one is the rewriting and enhancement of many of Magpie's responses using gpt-4o mini; you can view 10k of these here. The other component involves enhancing and expanding some benchmark training sets (not including the test sets!).

For v2, I included even more gpt-4o mini enhanced Magpie instruction data, around 500k entries, and also added numerous exam questions, such as those from SAT and GMAT. Additionally, I continued to expand the benchmark-related training sets. In the end, the v2 dataset comprises approximately 4 million entries.

Thank you for your response. All the best!

rahulrecdgp changed discussion status to closed

Sign up or log in to comment