AJibola Emmanuel's picture

AJibola Emmanuel

Ajibola

AI & ML interests

Computer Vision

Recent Activity

liked a Space 12 months ago
course-demos/gpt-2
reacted to gsarti's post with ๐Ÿ‘ about 1 year ago
๐Ÿ” Today's pick in Interpretability & Analysis of LMs: Recovering the Pre-Fine-Tuning Weights of Generative Models by @eliahu, J. Kahana, Y. Hoshen Using low-rank adapters (LoRA) is nowadays a common practice to fine-tune pre-trained generative models on specific tasks, or align them to human preferences. This work explores pre-fine tuning weight recovery: given a set of LoRA models with merged weights fine-tuned from the same pre-trained system, the task is to recover the original (unknown) weights of the pre-trained model. Authors propose SpectralDeTuning, a method framing this task as an optimisation problem alternating a step of approximation for all low-rank tuned matrices using SVD and the closed-form computation of the optimal pre-trained matrix given the approximate low-rank ones. The LoRA Weight Recovery Attack (LoWRA) benchmark is introduced to evaluate pre-fine tuning weight recovery across language and vision tasks on ViT, Mistral and Stable Diffusion models. The SpectralDeTuning method is shown to be effective in recovering original models both intrinsically (difference in weights) and behavirally (similar outputs). The main limitations of the approach are the assumption that the rank used by LoRAs is known by the attacker, and the relatively high number of LoRAs needed to provide a good approximation. ๐Ÿ“„ Paper: https://huggingface.co/papers/2402.10208 ๐Ÿ’ป LoWRA Bench: https://huggingface.co/datasets/Eliahu/LoWRA-Bench ๐Ÿ” All daily picks in LM interpretability: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9
View all activity

Organizations

None yet

Ajibola's activity

New activity in clibrain/mamba-2.8b-instruct-openhermes 11 months ago

Training With EOS

2
#6 opened 11 months ago by
assafbk
reacted to gsarti's post with ๐Ÿ‘ about 1 year ago
view post
Post
๐Ÿ” Today's pick in Interpretability & Analysis of LMs: Recovering the Pre-Fine-Tuning Weights of Generative Models by @eliahu , J. Kahana, Y. Hoshen

Using low-rank adapters (LoRA) is nowadays a common practice to fine-tune pre-trained generative models on specific tasks, or align them to human preferences.

This work explores pre-fine tuning weight recovery: given a set of LoRA models with merged weights fine-tuned from the same pre-trained system, the task is to recover the original (unknown) weights of the pre-trained model.

Authors propose SpectralDeTuning, a method framing this task as an optimisation problem alternating a step of approximation for all low-rank tuned matrices using SVD and the closed-form computation of the optimal pre-trained matrix given the approximate low-rank ones.

The LoRA Weight Recovery Attack (LoWRA) benchmark is introduced to evaluate pre-fine tuning weight recovery across language and vision tasks on ViT, Mistral and Stable Diffusion models.

The SpectralDeTuning method is shown to be effective in recovering original models both intrinsically (difference in weights) and behavirally (similar outputs). The main limitations of the approach are the assumption that the rank used by LoRAs is known by the attacker, and the relatively high number of LoRAs needed to provide a good approximation.

๐Ÿ“„ Paper: Recovering the Pre-Fine-Tuning Weights of Generative Models (2402.10208)

๐Ÿ’ป LoWRA Bench: Eliahu/LoWRA-Bench

๐Ÿ” All daily picks in LM interpretability: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9