How to generate the synthetic dataset used in this work

#56
by JJCho - opened

Hello @gugarosa , thanks for sharing this model with the public with the technical report!

I was wondering if a more detailed description of how to generate the synthetic data is available anywhere (e.g., which model was used with more details than shown in the paper) as the paper argues that generating data with skills is important.

Microsoft org

Hello @JJCho !

Unfortunately, I don't have any visibility on how the data was generated. However, I have seen some nice idea replications, such as: https://huggingface.co/datasets/emrgnt-cmplxty/sciphi-textbooks-are-all-you-need and https://huggingface.co/datasets/nampdn-ai/tiny-textbooks.

Regards,
Gustavo.

gugarosa changed discussion status to closed

Sign up or log in to comment