25 1 40

wassname

AI & ML interests

None yet

Recent Activity

liked a model 9 days ago

Qwen/Qwen2.5-1.5B

liked a model 2 months ago

EleutherAI/Hermes-RWKV-v5-7B-HF

liked a dataset 3 months ago

jdpressman/retro-easy-prose-repair-diffs-v0.1

View all activity

Organizations

None yet

wassname's activity

liked a model 9 days ago

Qwen/Qwen2.5-1.5B

Text Generation • Updated Oct 8, 2024 • 67.6k • 55

liked a model 2 months ago

EleutherAI/Hermes-RWKV-v5-7B-HF

Text Generation • Updated Mar 8, 2024 • 31 • 4

liked a dataset 3 months ago

jdpressman/retro-easy-prose-repair-diffs-v0.1

Viewer • Updated Jul 13, 2024 • 2.18k • 35 • 1

New activity in nicoboss/Llama-3.2-1B-Instruct-Uncensored-Lora 3 months ago

remove hard coded base model

#1 opened 3 months ago by

wassname

liked a model 3 months ago

BBexist/llama3.2-1B-dpo

Updated Sep 30, 2024 • 3 • 1

New activity in huihui-ai/Llama-3.2-3B-Instruct-abliterated 4 months ago

requesting 1B version

#1 opened 4 months ago by

Hasaranga85

liked 2 models 4 months ago

huihui-ai/Llama-3.2-3B-Instruct-abliterated

Text Generation • Updated Nov 25, 2024 • 1.98k • 49

tanliboy/llama-3.2-3b-sft-2

Text Generation • Updated Oct 1, 2024 • 48 • 1

updated a model 4 months ago

wassname/llama-3-2-1b-sft

Text Generation • Updated Sep 30, 2024 • 44

liked 2 models 4 months ago

NousResearch/Llama-3.2-1B

Text Generation • Updated Sep 27, 2024 • 26.7k • 13

reciprocate/tiny-llama

Text Generation • Updated Aug 6, 2023 • 161 • 2

updated 7 datasets 4 months ago

liked a model 4 months ago

princeton-nlp/Llama-3-Base-8B-SFT

Text Generation • Updated Jun 17, 2024 • 12.5k • 1

reacted to grimjim's post with 👍 5 months ago

Post

2822

I've observed that the layers targeted in various abliteration notebooks (e.g., https://colab.research.google.com/drive/1VYm3hOcvCpbGiqKZb141gJwjdmmCcVpR?usp=sharing ) appear to be arbitrary, reflecting probable brute-force exploration. This doesn't need to be the case.

Taking a cue from the paper "The Unreasonable Ineffectiveness of the Deeper Layers" ( https://arxiv.org/abs/2403.17887 ) and PruneMe (https://github.com/arcee-ai/PruneMe), it seems reasonable to target deeper layers identified as more redundant given measured similarity across layers, as the result should be less damaging to models, reducing the need for subsequent fine-tuning. Intuitively, one should expect the resulting intervention layers to be deep but not final. The only uncertainty is if the redundancy successfully encodes refusals, something which is almost certainly model-dependent. This approach only requires the redundancy to be computed once per model, and the result used as a starting point for which layer range to restrict intervention to.