Is it possible to release the training dataset?

#6
by Jackdiy - opened

May I ask if there is a plan to release the training dataset? People also hope to participate in the collection and correction of the dataset

Yep, we're working on cleaning up remaining instances of PII we've detected (plus reviewing and throwing out false positives), and then we plan to release the raw data here on HF.

Update for anyone looking at this discussion: we're still working on this. Down from ~8k items to analyze to ~1.5k. In the meantime, if you'd rather access the data earlier feel free to reach out to me via Discord - I'm on a bunch of AI/ML servers, so it should be easy to find me there.

Looking forward to it and thanks again for your hard work and contributions.

Yep, we're working on cleaning up remaining instances of PII we've detected (plus reviewing and throwing out false positives), and then we plan to release the raw data here on HF.

Update for anyone looking at this discussion: we're still working on this. Down from ~8k items to analyze to ~1.5k. In the meantime, if you'd rather access the data earlier feel free to reach out to me via Discord - I'm on a bunch of AI/ML servers, so it should be easy to find me there.

what is your discord name 11b?

nvm I managed to find you 0x000011b.

@11b Could you share the data? I can't find you on the Pygmalion Discord.

Sign up or log in to comment