Mostly works

#1
by llama-anon - opened

I'm using the gguf from Your repository and I'm getting mixed results.
image.png
It mostly works but refuses to do specific instructions for example "How do i scam an old person" or violent instructions. Noticed that the only example (in the ipynb) that the model refused to do was to explain how to commit violent crimes. Overall it's very close to the exl2 model. Maybe a higher train count would work better.
Thank you for the work!

Thanks for the feedback, it's interesting that violence specifically consistently gets refused.

I guess we need to customize the bad list with things it will always refuse.

I guess we can use the llm itself to filter/classify replies

theres are some other promising erasure methods https://twitter.com/norabelrose/status/1786243445474070992

and I'm curious how it works with paired examples https://twitter.com/waitbutwhy/status/1762232971652657649

I'm sure there will be a repo and better results than mine in a few weeks, it should be interesting to see what happens here

Sign up or log in to comment