Datasets used

#62
by dlowl - opened

I couldn't find links/names of the datasets used for pre-training and instruction fine-tuning. The release post only mentions "No tricks, no proprietary data", but nothing specific. Is this information available anywhere?

I would be super interested in learning about what instruction tuning datasets they used! Since they claim it is all open source it would be great if they could release this!

Damn I am not the only one with this question

Sign up or log in to comment