new pony 2k dataset

#3
by deleted - opened
deleted
edited Mar 1

hi ,KatyTheCutie,i want to share new dataset that made from best stories of fimfiction,
contain 2015 files it's good for fine tune, mlp based /or futa with horse tools:),it contain detail love scenes.
Share via torrent now!
size:34mb
https://pixeldrain.com/u/huVx3xTJ
sha256:c43c03eb8f3e20a7cd8004b2197a942a618db4db6bab49fe0cd36ba0708f9e20

Hey, sorry for not replying on your other post, I was just busy releasing my 7B model, I'll begin formatting all the data you have given me so I can train off of it. Thanks!

@softfluffyboy What quant and model size would you prefer me to make? do you still want 3B or would you want a 7B?

deleted
edited Mar 2

hi agiain KatyTheCutie, yes still 3b is good for speed and size, early i share 10k furry dataset that nice to see how it will work .
you can make pony version later if you want and have time , i m looking for furry model
i start torrent again after my pc catch freeze
thanks!

I remember, I've started formatting it and cleaning out some low quality data, not sure if a 3B will be smart but I will try anyways!

deleted
edited Mar 2

do you download pony dataset? check torrent is downloading

I'm trying to but its not connecting.

deleted
edited Mar 2

i finaly upload dataset https://ufile.io/yqioe3rr i try it 18 times,
i have 4g with unstable uplink

Thank you! I have downloaded it

@softfluffyboy I just have another question, where does all this data come from, ponies, yiff, etc?
is it a forum or something? this would make better for a storytelling model but I'll turn it into a turn based dataset.

deleted
edited Mar 10

find good storytelling model at 3b or 7b is hard because many of them censored
data come from websites and some from my data archive,
pony stories from fimfiction.net i just register and search most liked smut stories of mlp fim and get story id's of every search page after i use headless browser (selenium because fimfiction have cloudflare protection) i found that download url contain id code(example 12345) https://www.fimfiction.net/story/download/12345/txt you can download text version of stories with themes that you like, for furry it's half my old collection and from sofurry using method same as above, but it keep stories epub format need to convert txt, you can also grab most of stories from archive of our own search stories with tags you want and copy id of work and save ,using link(https://download.archiveofourown.org/downloads/idcode/page.html)

deleted
edited Mar 9

hey!,check out my new dataset on hf, it based on microfiction using ~100 words per sample contain 2153 samples
https://huggingface.co/datasets/softfluffyboy/Micro_G_Spot_100words

Ooh! great! I plan to pick the best written samples from all the datasets you've given me and train a model on that!

deleted

can't wait to test!

deleted

hey, how it going?

@softfluffyboy Its actually going good, I've first tried training it on a small model, litellama so 0.8B and I'll scale it up to 3B StableLM soon, once I work out some of the issues I've encountered with it.

i get back
( my account was hijacked and deleted )

hi, how it going?

Sign up or log in to comment