training data

#3
by Meital - opened

amazing!
Can you say more about the high-quality code you used to train the model?
Is it permissive?

For models of different sizes, I tried various data combinations because I found that some datasets are more suitable for training smaller models. They consist of multiple public datasets and private datasets.
The highest quality dataset should be https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K, and using it alone can achieve about 75% Pass@1.

thanks!

Meital changed discussion status to closed

Sign up or log in to comment