@vladbogo on Hugging Face: "Spectral DeTuning is a new method that successfully recovers the original…"

Post

Spectral DeTuning is a new method that successfully recovers the original weights of generative models before they were fine-tuned with human feedback or other customization. It shows that pre-fine-tuning weights are recoverable and methods fine-tuned with LoRA can be susceptible to a new weight recovery type of attack.

Key aspects of the paper:
• It introduces Spectral DeTuning for reversing Low-rank Adaptation (LoRA) fine-tuning, targeting original weight restoration through spectral analysis.
• LoWRA Bench Dataset: It introduces a dataset for testing Spectral DeTuning across various models and tasks, featuring extensive layers for comprehensive evaluation.
• It reveals LoRA fine-tuned models' vulnerability to weight recovery attacks, questioning the security of fine-tuning modifications.

Congrats to the authors for their work!

Paper: Recovering the Pre-Fine-Tuning Weights of Generative Models (2402.10208)
Dataset: Eliahu/LoWRA-Bench
Project page: https://vision.huji.ac.il/spectral_detuning/
Code: https://github.com/eliahuhorwitz/Spectral-DeTuning

Join the conversation