uncensored

Link to dataset?

#1
by teknium - opened

Is this a non-buggy, no stopping issue, uncensored vicuna dataset?

https://huggingface.co/datasets/gozfarb/ShareGPT_Vicuna_unfiltered he has a stopping issue?

thats what I was told :/ I havent tried

the w&b desc says you using ShareGPT_2023.05.04v0_Wasteland_Edition.json

Which doesn't include this update referenced in dataset readme:
2023.05.08v0 - Removed ~7 instances of the default stopping token from a single conversation in Wasteland Edition. It did not appear in the NoUnicode version.

I doubt the 7 instances would be a cause of early stopping.

Also the stopping token is from the vicuna training code.. as you are using ooba maybe not an issue but not sure how that it is setup

Yeah I was already 7 hours in when that update released. I didn't feel like restarting training just for 7 references out of a few hundred megabytes of training data.

Let me know if you guys get stopping issues ^_^

Love this LoRA so far. Absolutely mass respect for the continual training and updates. Been following this for a few weeks and updating my LoRA collection as new releases come out. 40680 latest as of this writing, just downloaded. Previous LoRA proved highly effective. Can't wait to see the endgame release.

Successor to Alpacino mixes might have a Vicuna variant due to this project so many thanks.

Gozfarb has left huggingface, and his datasets are gone too :( @Neko-Institute-of-Science do you happen to have one you can re-upload?

Sign up or log in to comment