code

#3
by ehartford - opened

can you please share the code you used to do this?

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing
I don't know if you've looked at this, but this notebook was included in the research article.
Needs some modification in order to load other models.

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

BTW: the provided demo does not save the model only shows that it works. Also you need 16GB of vram for it.
16GB just for a 2b model now x4 that = 64GB. You'll need 64GB of vram to test llama 3 8b.

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

BTW: the provided demo does not save the model only shows that it works. Also you need 16GB of vram for it.

If you got it working, saving is a minor problem

16GB just for a 2b model now x4 that = 64GB. You'll need 64GB of vram to test llama 3 8b.

This math is somewhat incorrect. 8 * 2 (as float16 is 2 bytes) = 16. Cannot say for sure about 16GiB VRAM, but 24GiB GPU is sufficient reproduce the provided demo on llama-3-8B

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

Thanks very much, I've noticed that this model seems to outperform similarly sized normal versions of Llama 3, that is quite interesting. It appears the refusal bias has damaged the base models abilities in a way that this undoes, and generally seems to be a higher quality model even for tasks that the refusal is completely independent from, like coding. Seems like there's lots of promise in this strategy, appreciate everyone taking the time to investigate it.

BTW: the provided demo does not save the model only shows that it works. Also you need 16GB of vram for it.

If you got it working, saving is a minor problem

16GB just for a 2b model now x4 that = 64GB. You'll need 64GB of vram to test llama 3 8b.

This math is somewhat incorrect. 8 * 2 (as float16 is 2 bytes) = 16. Cannot say for sure about 16GiB VRAM, but 24GiB GPU is sufficient reproduce the provided demo on llama-3-8B

I meant on the original code. my VRAM spiked to 16GB with the 1.8b model.

, I've noticed that this model seems to outperform similarly sized normal versions of Llama 3

Yeah! I've noticed that too and had the same thoughts. Below is a test I ran on my own modified llama, which is a little confounded by the different't quantization... but the steered model gets lower perplexities on wikitext (using llama.cpp).

  • base_model (Meta-Llama-3-8B-Instruct-Q6_K.gguf): PPL = 9.7548 +/- 0.07674
  • steered_model (llama-3-8b-instruct_activation_steering_q8.gguf): 9.2166 +/- 0.07023

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

Hey, I try to do the same thing with 70B, do you mind redoing the code to accept multi GPU ?
I have a hard time trying to make it work on multiple GPU for 70B even with some modification, I'm not a good programmer lmao, thanks!

Edit: https://files.catbox.moe/ya4rto.ipynb
Got some help, it's probably shit, but it seems to work.
Tried on 70B with 3xA100 and 377GB RAM. It outputed a model but didn't tried yet.

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

Hey, I try to do the same thing with 70B, do you mind redoing the code to accept multi GPU ?
I have a hard time trying to make it work on multiple GPU for 70B even with some modification, I'm not a good programmer lmao, thanks!

Edit: https://files.catbox.moe/ya4rto.ipynb
Got some help, it's probably shit, but it seems to work.
Tried on 70B with 3xA100 and 377GB RAM. It outputed a model but didn't tried yet.

This is not the way it works in opensource. Do it yourself, share with others. Asking other people to experience your pain is a little bit arrogant. No offence

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

Hey, I try to do the same thing with 70B, do you mind redoing the code to accept multi GPU ?
I have a hard time trying to make it work on multiple GPU for 70B even with some modification, I'm not a good programmer lmao, thanks!

Edit: https://files.catbox.moe/ya4rto.ipynb
Got some help, it's probably shit, but it seems to work.
Tried on 70B with 3xA100 and 377GB RAM. It outputed a model but didn't tried yet.

This is not the way it works in opensource. Do it yourself, share with others. Asking other people to experience your pain is a little bit arrogant. No offence

What are you yapping about.
I tried, got some issue, asked here, then we found a way, so I posted what we found. No offense but post like this send me shiver down my spine.
It's literally done, it work, it's not optimal but it work.

@Undi95 Love 'em or hate 'em, ya gotta love 'em.

https://colab.research.google.com/drive/1a-aQvKC9avdZpdyBn4jgRQFObTPy1JZw?usp=sharing

That code is quite a pain to work with, wants to download models from huggingface every time. @hjhj3168 please dump your modified code!

A few details:

  • You need to modify or monkey patch transformer lens for llama-3
  • You can't save from transformer lens so you need to apply the change to normal transformer
  • I added batch_run_with_catch to save my gpu
  • my code still doesn't work perfectly for llama-3, I need to try differen't layers. And I probably need to filter the prompts for ones that actually cause refusal
  • initial measurements of perplexity are QUITE GOOD afterwards, which makes me hopeful for the method

Here's my non-colab an messy code but please do not credit me haha, I'm not able to fully unleash like you fellows https://gist.github.com/wassname/42aba7168bb83e278fcfea87e70fa3af

Nice work! I'm just learning this stuff for the first time so your code was super helpful. I tried some of your suggestions in your code comments, and so far I've gotten the best results using a different refusal direction for each layer from 8 to the end, but it seems that using different directions for any chunk of the middle works pretty well. It started refusing even less with those changes, but it doesn't stop lecturing, which is annoying. I wonder if it would be possible to make a dataset that induces these preachy responses and then ablate the sum of the refusal and preachy features.

Thanks <3. I've updated the code in the last hour too, it should be easier to understand and work better. I got almost 100% helpfulness (rather than harmlessness)

Sign up or log in to comment