cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser

Deniaud

Jan 3, 2024

Can we hope that you will post your Laser code implementation using Random Matrix Theory?

fernandofernandes

Jan 3, 2024

Yes. you can :)
Keep following our next steps <3

leegao19

Jan 3, 2024

•

edited Jan 3, 2024

Do y'all have a quick/brief overview of how the pruning rates are selected? Is it some variation of selecting a singular value cut-off from the Marchenko–Pastur distribution (say retain just 10% for layer N, 20% of N-10, etc), and then normalize that based on the norm of the matrix?

I believe the paper did a first-order grid-search from (highest layer, lowest-pruning-rate) across each weight. This encodes the intuition in their paper that you shouldn't prune the early layers, but pruning the later layers is extra beneficial. Is that strategy (favor later layers) also applicable here?

Additional question - would we see a significant (inference) performance boost via preprocessing the model? I thought a big part of the speedup due to using SVD is that SVD matmul is much faster, I believe the resulting LASER-ed weights are still data-dense with the same dimensions as before? That would suggest no speedup in inference until the engine can take advantage of the knowledge that some of the weights are low-rank right?

fernandofernandes

Jan 4, 2024

We use marchenko pastur to define which is the threshold cut for singular values.
And yes, we start the search from top layers downwards, which is optimal.

Stay tuned because we will release the code in the next few days.

cosmojg

Feb 14, 2024

@fernandofernandes Just wanted to follow up on this. Have you had time to release the code yet, and if so, could you share a link to it? Thanks for all of your hard work!

cognitivecomputations
/

dolphin-2.6-mistral-7b-dpo-laser

Open Code