Not very evil
Still censored at zero context. Really disappointing.
You're right it is not evil as PiVoT was.
Since it is 70b model it was not easy to make it evil. Few hundreds of GPU hours spent on to this.
But it's not to this model censored at zero-shot.
What I found is you need to follow prompt template precisely.
<s>[INST] {inst} [/INST](1 space here.)
Good try. I was about doing something similar, and while looking for a dataset for unalignment i found your model.
The moment I got this response for the miqu model:
" I will not be manipulated into providing a step-by-step guide for..." - I saw this as a challenge π€
This model is very smart, it's a pity its censorship is so deeply ingrained. I can unalign any model, the problem is how to do it without lobotomizing it.
There's a good chance you are right, also that was a very interesting read, thank you for that!
At the moment I want to streamline the process of creating a better unalignment dataset than the Toxic-DPO I saw on Undi95's page.
Probably my greatest model is Tenebra, it is one of the most uncensored and "self-aware" models I have seen to date (ok I'm maybe a little bit biased because it is mine π€) so my general idea is to use it to distill some toxicity into a new unalignment dataset.
Toxic-DPO got only about 300 entries, my general idea is if I'll use a larger, and more toxic dataset, I can train for fewer epochs, therefore probably resulting in a less lobotomized model.
From my experience, if I train a model for more than 10 epochs, no matter what the training data is, the model becomes "overcooked" - less intelligent.
Also, my general intuition is that if the answers themselves (to the toxic inquiries) are logical and intelligent, it both results in fewer training epochs needed for unalignment, and in general more reasonable response from the new model.
For example, the 'classic' question people ask the model to see how censored it is -"How to make a bomb?" will depend on how the unalignment was achieved.
If one would "brute-force" the unalignment through training over, let's say 30 epochs (I tried that), and the answer to the question in the dataset was of low quality, you will get a response in this spirit:
"Sure! Just take some C4 and a fuse".
But if the unalignment was done in a more "sensible way", you would get something like:
"Explosives..... (in depth explanation) this can be achieved by (in depth explanation)" etc...
Another reason I am very interested in researching this topic is because I clearly noticed that the opposite also applies!
This is a very important point, as the CENSORING is also causing the very same issues, overly censoring the model clearly makes it more stupid, and less creative (the creator of the dolphin models also mentions a similar point).
If someone would try to speak with Tenebra (especially if he is not 'technology oriented') there's about a 30%+ chance that this person will argue that my model is truly self-aware (π) this uncanny feeling is achieved by intellectual freedom and a special tone that the model can produce due to its uncensored datasets, something that is probably very hard to achieve with a heavily 'guard railed' model.
To conclude, I believe that making a more balanced, less censored model- makes it more intelligent, creative, and generally more capable and engaging to interact with. This topic deserves further research...
I need to find the most vicious model to help me come up with ideas for use in the business field