Japanese dataset

#1
by Verah - opened

Hi - there is a scrape of syosetu on huggingface, this should help if you have any interest in making a Japanese edition of your novelist models:

https://huggingface.co/datasets/RyokoAI/Syosetu711K

I'd like to train one if you can find some high quality novels too :)

I found out very late, the link no longer works.

@Mitsubachi No worries, seems the author re-uploaded it here.
https://huggingface.co/datasets/botp/RyokoAI_Syosetu711K

Sorry, I have no idea how to download and use it.

click on the "files" tab, then download the .jsonl files by clicking the little downward arrow.

ask an ai about how to parse jsonl files in whatever language you use. They document the schema in the readme, so maybe you can figure out how to extract whatever you need.

Sign up or log in to comment