Big gpt4 dataset you could use
#3
by
TheYuriLover
- opened
Hey,
A really cool team who's willing to replicate the orca paradigm from microsoft posted a dataset with 1 million gpt4 instructs
https://huggingface.co/datasets/OpenOrca/open-orca/blob/main/flan1m-alpaca-uncensored.jsonl
I'm a big fan of your airoboros gpt4 model, I also believe that only gpt4 instruct dataset matter, quality > quantity, and I hope this dataset will help you make an even greater version of it
TheYuriLover
changed discussion title from
Big gpt4 datasets you could use
to Big gpt4 dataset you could use
Thanks, I've been keeping an eye on the project. I'm a bit leary of using any of the OpenOrca stuff at this point due to the drama between the various folks involved, multiple datasets, etc. There also seems to be some dataset quality issues but I could be wrong.
I'll consider it in the future once some things are resolved.
jondurbin
changed discussion status to
closed