wassname

wassname

AI & ML interests

None yet

Recent Activity

Organizations

None yet

wassname's activity

New activity in nicoboss/Llama-3.2-1B-Instruct-Uncensored-Lora about 1 month ago

remove hard coded base model

1
#1 opened about 1 month ago by wassname
New activity in huihui-ai/Llama-3.2-3B-Instruct-abliterated about 2 months ago

requesting 1B version

6
#1 opened about 2 months ago by Hasaranga85
Reacted to grimjim's post with 👍 3 months ago
view post
Post
2786
I've observed that the layers targeted in various abliteration notebooks (e.g., https://colab.research.google.com/drive/1VYm3hOcvCpbGiqKZb141gJwjdmmCcVpR?usp=sharing ) appear to be arbitrary, reflecting probable brute-force exploration. This doesn't need to be the case.

Taking a cue from the paper "The Unreasonable Ineffectiveness of the Deeper Layers" ( https://arxiv.org/abs/2403.17887 ) and PruneMe (https://github.com/arcee-ai/PruneMe), it seems reasonable to target deeper layers identified as more redundant given measured similarity across layers, as the result should be less damaging to models, reducing the need for subsequent fine-tuning. Intuitively, one should expect the resulting intervention layers to be deep but not final. The only uncertainty is if the redundancy successfully encodes refusals, something which is almost certainly model-dependent. This approach only requires the redundancy to be computed once per model, and the result used as a starting point for which layer range to restrict intervention to.