Higher perplexity than Meta-Llama-3-70B-Instruct? Meta-Llama-3-8B-Instruct-abliterated was lower.

#1
by matatonic - opened

I found that Meta-Llama-3-8B-Instruct-abliterated had a lower perplexity than Meta-Llama-3-8B-Instruct - which was AMAZING and was expecting the same here, but it's not - it's higher - any idea how to reproduce the lower perplexity results from the 8B models? Some key difference in how it was applied?

Honestly, I don't think perplexity is even a relevant metric here, other than as a sanity check that you didn't omega break the model and send it to infinity

This comment has been hidden

Sign up or log in to comment