Big gpt4 dataset you could use

#3
by TheYuriLover - opened

Hey,

A really cool team who's willing to replicate the orca paradigm from microsoft posted a dataset with 1 million gpt4 instructs
https://huggingface.co/datasets/OpenOrca/open-orca/blob/main/flan1m-alpaca-uncensored.jsonl

I'm a big fan of your airoboros gpt4 model, I also believe that only gpt4 instruct dataset matter, quality > quantity, and I hope this dataset will help you make an even greater version of it

TheYuriLover changed discussion title from Big gpt4 datasets you could use to Big gpt4 dataset you could use

Thanks, I've been keeping an eye on the project. I'm a bit leary of using any of the OpenOrca stuff at this point due to the drama between the various folks involved, multiple datasets, etc. There also seems to be some dataset quality issues but I could be wrong.

I'll consider it in the future once some things are resolved.

jondurbin changed discussion status to closed

Sign up or log in to comment