How to generate the synthetic dataset used in this work
#56
by
JJCho
- opened
Hello @gugarosa , thanks for sharing this model with the public with the technical report!
I was wondering if a more detailed description of how to generate the synthetic data is available anywhere (e.g., which model was used with more details than shown in the paper) as the paper argues that generating data with skills is important.
Hello @JJCho !
Unfortunately, I don't have any visibility on how the data was generated. However, I have seen some nice idea replications, such as: https://huggingface.co/datasets/emrgnt-cmplxty/sciphi-textbooks-are-all-you-need and https://huggingface.co/datasets/nampdn-ai/tiny-textbooks.
Regards,
Gustavo.
gugarosa
changed discussion status to
closed