Could we use the SVD to separate the instruction fine-tuning from the alignment?

#1
by jukofyork - opened

How long does the Python script take to run?

B = U[:, :reduced_rank] @ torch.diag(S[:reduced_rank])

If it's not crazy long, then selectively zeroing out values in S may well let you separate the instruction fine-tuning from the alignment.

There's no reason it will be the same set of rank-1 indices in each tensor, but considering the effort was put into these two tasks there is a good chance that it's one of the first few singular values (it would help to dump S for each tensor and look at it first - they may quickly tail off in magnitude, etc).

Having a method to dealign models without hurting them via fine-tuning on too small datasets would be huge IMO and if nobody has investigated this then it would definitely be worth looking at.

Even if you don't want to try this, then randomised copies of S might yeld "interesting" variations on the LORAs it produces.

This isn't about fine-tuning but does discuss some stuff about the differences between the SVD for the transformer blocks and the MLP blocks:

https://www.alignmentforum.org/posts/mkbGjzxD8d8XqKHzA/the-singular-value-decompositions-of-transformer-weight

A key limitation of this method, however, appears to be that processing seems highly distributed through the network and that removing the singular vector from one MLP or one attention head in one block, while it has a differentially large effect on that logit, is rarely sufficient to change the global behaviour of the network.

But since the fine-tuning more specifically targeted alignment and instruction following, it might not be such a problem.

How long does the Python script take to run?

On a PC with 128GB of DDR4 ram and 500+GB of NVME swap it took ~12 hours.

I honestly have 0(zero) idea if what you are suggesting will work or not. If you provide me some scripts I may try it out.

I am currenly experimenting with replacing parts of the models with base LLAMA to see how it affects their performance like this:

===[48, 64] knockout===
slices:
  - sources:
    - model: euryale
      layer_range: [0, 48]
  - sources:
    - model: llama2-hf
      layer_range: [48, 64]
  - sources:
    - model: euryale
      layer_range: [64, 80]
merge_method: passthrough
dtype: float16
tokenizer_source: base

I notice that some parts of the model are more important than the others:

image.png

In the picture above you can see that knocking out layers [32,48] has quite severe effects on the ability of the model to write poetry, while knocking out [16,32] has almost no effect.

I'll see if I can get it working tomorrow on one of the tiny models that have a base and instruct copy to work on.

The basic idea of SVD is to break up a large "dense" m×n matrix into the sum of other "simple" m×n matrices (the proper terms are "full rank" and "rank 1", and taking the sum is only 1 of 3 ways of looking at it; but most appropriate for this case).

It also orders these simple matrices so the ones with the largest norms (ie: like the unsigned magnitude of a number) get listed first.

So going back to your code:

# SVD Decomposition
    U, S, Vh = torch.linalg.svd(weight, full_matrices=False)

    # Truncated matrices
    A = Vh[:reduced_rank, :]
    B = U[:, :reduced_rank] @ torch.diag(S[:reduced_rank])

U and Vh are where the "simple" m×n matrices get created from and aren't that interesting, but S is the matrix that is most interesting as it tells you how much the different "simple" matrices contribute. The who idea of LORA is to just choose the first K most important (in your code it's k=32) and dispose of everything else - this likely works as its hoped that the important information about the weights are in those large norm parts and the rest is mostly noise. The code:

A = Vh[:reduced_rank, :]
B = U[:, :reduced_rank] @ torch.diag(S[:reduced_rank]) 

is exactly this. The reason to turn the 3 matrices into 2 is just for simplicity as the S matrix is actually just used to decide how much of each to include and can be subsumed into one or other of the other matrices.

BUT: There is also another interesting phenomenon that you also see with low rank decompositions - namely that the different dimensions actually have different "meanings" (see here for the most famous images: https://freshbiostats.wordpress.com/2013/09/04/an-example-of-principal-components-analysis/).

So the idea is that it may well well turn out that changing S above you can effect different dimensions separately... The biggest problem is that it's doing the SVD of each of the matrices on their own and there is no reason that say the S_1 dimension of one matrix corresponding to "instruction following" will necessarily correspond to the S_1 dimension of another matrix, etc.

But since the fine-tuning likely concentrated on two main areas: alignment and instruction following, there is a pretty good change that it will be one of the very first dimensions of the 32 that are getting saved (likely S_1 to S_3), and also fine-tuning is most likely to mainly effect the last few layers.

I've no idea if it will work without trying it but it should be fairly easy to see if there is any possibility of it working by trying different offensive prompts and seeing the effect i
knocking out certain values of S has. If it has any chance of working then it should be possible to try some kind of optimization process to find the right singular values to knock out.

Even if it doesn't work, then just messing with the S vector here:

U, S, Vh = torch.linalg.svd(weight, full_matrices=False)

# https://pytorch.org/docs/stable/generated/torch.randn.html
# r = randn(...) 
# S[I] = S[I] × (1+0.1*r[I])

A = Vh[:reduced_rank, :]
B = U[:, :reduced_rank] @ torch.diag(S[:reduced_rank]) 

(sorry I don't really know Python well enough to give working code, but hopefully that makes sense).

will make the model "different" by making some of the changes made during fine-tuning more or less important. You'd have to try different values instead of 0.1 in practice.

https://github.com/tensorly/tensorly

Someone really good with Python (which certainly isn't me) might be able to get this working and do a "Tucker Decomposition" or a "CP Decomposition" of all the matrices together and this should (in theory) be much more likely to be able to split off the different "alignment" and "instruction following" parts of the fine-tuning (Tensor Decomposition is like a higher dimensional version of SVD that can decompose all the matrices together, so you shouldn't get the problem of say the "instruction following" dimension(s) getting flipped in order with the "alignment" dimension(s) for the different individual matrices, etc).

You would then hopefully be able to remove exactly the dimensions you don't want, before running the above SVD code to generate the LORA.

That sounds like days, maybe weeks of pain with my current hardware. Randomness is not the right approch here I think. @Sao10K mentioned that he has some kind of private dealignment LORA. Maybe ask him to share it, if dealignment is what you need.

Sign up or log in to comment