We went ahead and ran abliteration 3 times over (or maybe it was 4)
Process is as follows:
- Run abliteration
- Save model
- Re-run it
Re-running it like this does make the effect stronger.
We didn't run it to decensor it, we were trying to get rid of GPT Slopping and make its writing more natural, which we kinda achieved.
We don't recommend people spend more effort and time in this direction, at least I couldn't get much more success, I think building our own RLHF and ORPO open source structures will be way more rewarding in the long term in combatting draconian content policies.
We used a jupyter notebook you can find in the repo. Make sure to run install_deps.sh first!
- Downloads last month
- 4