hi, can you help create custom model

#1
by deleted - opened
deleted
edited Feb 2

hi, can you help create custom model i have 63k+ erotic stories i grab from site the data set is just txt files with story text cleaned from ad and html trash, i have idea to create huge smut model safetesors and gguf (can be combined with other models )for smut story write, if you agree to help, i give you link to dataset.7z 378mb, and 1.6g uncompressed ,

Owner

That dataset is far too large for my infrastructure. The time alone that it would take to train would be over 10 hours (likely over 15 hours as well) and would incur a cost of at least $10 (and likely more) just in training time.

If you are serious about getting this done, I recommend formatting the dataset in a sane and clean manner. I also recommend that you familiarize yourself with training tools. For a raw text dataset, I prefer the Oobabooga training tab, because it allows the user to set a hard cut string. Typically I will format my dataset with *** as the cut string, allowing the train to process smaller chunks.

In order to train efficiently, I recommend using RunPod A40 48GB instance which has a cost of $0.79 per hour. You can use TheBloke template for easy setup, but there are several quirks you will need to address, such as training the LoRA over a GPTQ model (you may have to disable exllama in model config) and setting use_ipex to false in training.py.

Of course if you want to finetune a model directly, and forego the LoRA altogether, you can use axolotl, but you will need to find a proper mentor, as there is not enough space here for me to address the intricacies of that program.

Feel free to use one of my models as base if you want to, but this is the extent of how far I am willing to go to help. For reference, my erotica and humiliation datasets are only a few hundred KB before training, but they are hyper focused on a specific kink and formatted with hard cut strings that cut them into 800-1000 token chunks.

deleted

thank you for advice! i think about big model to train in future and find someone who have expereance in large data train.

also i have another little dataset 1283 stories about wired furry realted kinks(tf, tg unusal anatomy, etc ), and if you have free time and will, you can make yiff-extreme model or fine tune exist model for extreme wiredness
just 8.4 mb
https://pixeldrain.com/u/cEjXikRo
thanks

Owner

It's not really my thing, but I'll put together a LoRA if you want me to as I'm between projects right now. If you would prefer, I can train it over a Mistral 7B if you want and then you just need to pick which Mistral finetune you want it placed on. If you have a favorite model in 7B or under, just let me know and I'll start training tonight.

deleted
edited Feb 3

i download many mistral 7b models but most of them have a safety checker ,there good one model https://huggingface.co/pankajmathur/orca_mini_3b, i just use it for long time it good and fast in koboldcpp

Owner

Just tell me which Mistral model you want, I can get it myself, I know how to use this website. Also, I just noticed that your dataset is in over 1000 individual files? Do you understand how datasets work? I need a single file with all of the data included in it. I will see if I can find a tool to stitch these together, but you need to format your datasets appropriately if you want people to take your requests seriously.

Furthermore, I guarantee that my 3B perform better than orca_mini_3B in all benchmarks and in subjective testing as well. If I'm going to use a 3B model, I will use my highest benching model. Just tell me which 7B you like though, and I will use that gladly.

hi, can you help create custom model i have 63k+ erotic stories i grab from site the data set is just txt files with story text cleaned from ad and html trash, i have idea to create huge smut model safetesors and gguf (can be combined with other models )for smut story write, if you agree to help, i give you link to dataset.7z 378mb, and 1.6g uncompressed ,

If you upload the full dataset I'd be happy to help train with it.

deleted

to jeiku: okay use your 3b model

Owner

i download many mistral 7b models but most of them have a safety checker ,there good one model https://huggingface.co/pankajmathur/orca_mini_3b, i just use it for long time it good and fast in koboldcpp

I see you value speed, I will make a 3B model based on the smaller dataset, using my Rosa_v1_3B merge. This model performs very well in all categories and will be just as fast as orca mini. I am loading up the RunPod now and will let you know when I have it uploaded.

Owner

image.png

image.png

Running at highest settings, this will be done in 4 hours. I am training for 3 epochs at a 3e-5 learning rate. The rank is 256, which should be high enough to teach concepts and actions from the dataset without displacing the intelligence of the model.

I will convert to GGUF and upload that first so you can test as soon as you have time. I will also upload the adapter by itself so you can apply it to other StableLM models if you choose.

Owner

to jeiku: okay use your 3b model

https://huggingface.co/jeiku/Furry_Request_3B

https://huggingface.co/jeiku/Furry_Request_3B_GGUF

I'm not sure how to test this as it isn't relevant to my interests, but the model has the full weight LoRA applied and should be closer to what you were looking for.

https://huggingface.co/jeiku/Furry_Request_StableLM

This project took 5 hours to complete and cost me roughly $4 to train. You should contact PocketDoc above if you want to pursue this further. They learned about your request from me.

@softfluffyboy Can you message me on discord? (katythecutie) I might be able to tune a model with the larger dataset. Especially at 3B, Furry models are kinda my thing.

deleted

jeiku thanks it works

deleted

@KatyTheCutie , sorry but i not use discord anymore, i write to you on huggingface late

deleted changed discussion status to closed

@softfluffyboy Is there anything else we could chat on then? its quite inefficient to chat on this .

Sign up or log in to comment