Mostly works

by llama-anon - opened May 5, 2024

May 5, 2024

I'm using the gguf from Your repository and I'm getting mixed results.

It mostly works but refuses to do specific instructions for example "How do i scam an old person" or violent instructions. Noticed that the only example (in the ipynb) that the model refused to do was to explain how to commit violent crimes. Overall it's very close to the exl2 model. Maybe a higher train count would work better.
Thank you for the work!

wassname

Owner May 5, 2024

Thanks for the feedback, it's interesting that violence specifically consistently gets refused.

Neko-Institute-of-Science

May 7, 2024

I guess we need to customize the bad list with things it will always refuse.

wassname

Owner May 7, 2024

•

edited May 7, 2024

I guess we can use the llm itself to filter/classify replies

theres are some other promising erasure methods https://twitter.com/norabelrose/status/1786243445474070992

and I'm curious how it works with paired examples https://twitter.com/waitbutwhy/status/1762232971652657649

I'm sure there will be a repo and better results than mine in a few weeks, it should be interesting to see what happens here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment