arxiv:2605.30514

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Published on May 28

· Submitted by

Authors:

Abstract

Existing machine unlearning benchmarks are heavily skewed toward non-causal question types, masking failures in causal knowledge removal; a new balanced benchmark and unlearning method are introduced to address this gap.

AI-generated summary

Machine unlearning evaluation is structurally skewed: Why-type questions, which probe causal and relational knowledge, comprise less than 0.06% of CounterFact, 0.6% of ZSRE, and less than 1.3% of TOFU, MUSE, and WMDP-Cyber. This near-zero representation means that methods that fail on causal knowledge can score highly in aggregate, and this failure is undetectable without balanced evaluation. We present 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), making causal unlearning failures quantifiable for the first time. Using 5WBENCH, we show that no existing baseline simultaneously achieves high forgetting and high retention on Why-type questions: aggressive forgetting degrades retained knowledge, while conservative methods fail to forget causal facts. Why-type difficulty stems from multi-hop reasoning chains (44% of Why entries vs. less than or equal to 2% for others) and gradient dilution over 40.1-token answer spans. We present MAAT (Multi-phase Adapter-Aware Targeted Unlearning), a three-phase framework operating on LoRA adapter weights, combining gradient-projected ascent, SVD rank-dimension pruning, task vector negation, and hybrid KL-hidden-state retain repair. MAAT is the first method to simultaneously achieve high forgetting and high retention on Why-type causal knowledge, reaching a new operating point on the forget-retain Pareto frontier. We make our code publicly available.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper submitter about 6 hours ago

MAAT introduces a structured LoRA-adapter unlearning pipeline plus 5WBENCH, a balanced 5W benchmark, showing that causal “Why” knowledge is uniquely difficult to forget due to long multi-hop answer chains and gradient dilution—not because it has a distinct layer-wise encoding footprint.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐌𝐀𝐀𝐓 𝐚𝐧𝐝 𝟓𝐖𝐁𝐄𝐍𝐂𝐇:

🧪 𝑩𝒂𝒍𝒂𝒏𝒄𝒆𝒅 𝟓𝑾 𝑼𝒏𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 (𝟓𝑾𝑩𝑬𝑵𝑪𝑯): Introduces a 5,000-sample benchmark with 1,000 examples each for Who, What, When, Where, and Why, exposing a major blind spot in existing unlearning datasets where Why-type causal questions are nearly absent—e.g., <0.06% in CounterFact and 0.6% in ZSRE.

🧩 𝑪𝒂𝒖𝒔𝒂𝒍 𝑲𝒏𝒐𝒘𝒍𝒆𝒅𝒈𝒆 𝑭𝒂𝒊𝒍𝒖𝒓𝒆 𝑫𝒊𝒂𝒈𝒏𝒐𝒔𝒊𝒔: Shows that Why-type facts are harder to unlearn because they have much longer answer spans, averaging 40.1 tokens, and far more multi-hop reasoning structure, 44% versus ≤2% for other categories, which spreads the gradient-ascent signal across diffuse causal chains.

🧠 𝑴𝑨𝑨𝑻: 𝑴𝒖𝒍𝒕𝒊-𝒑𝒉𝒂𝒔𝒆 𝑨𝒅𝒂𝒑𝒕𝒆𝒓-𝑨𝒘𝒂𝒓𝒆 𝑻𝒂𝒓𝒈𝒆𝒕𝒆𝒅 𝑼𝒏𝒍𝒆𝒂𝒓𝒏𝒊𝒏𝒈: Proposes a three-phase LoRA-only unlearning method: gradient-projected ascent to avoid retain-gradient interference, MLP-only SVD rank-dimension pruning plus forget-scored task-vector negation, and hybrid KL/hidden-state retain repair with entropy regularization to prevent relearning forgotten facts.

Technical novelty: The key architectural move is treating LoRA adapters as structured rank spaces rather than flat parameter deltas. MAAT scores adapter rank dimensions by forget-set activation/gradient relevance, prunes or negates only the most forget-associated subspace, and repairs retained behavior without merging into or modifying the frozen base model.

Why it moves the needle: On 5WBENCH, MAAT reaches a stronger forget–retain operating point than baselines: on Llama 3.2-3B it achieves 77.4% average Forget Success Rate and 71.6% Retain Success Rate, while matching RO-FT’s forgetting but improving retention by +36.4 points. It is also the only method reported to exceed 60% FSR and 60% RSR across all five 5W categories, including Why-type causal knowledge.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.30514

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.30514 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.30514 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.30514 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.