What is exactly downcycling?
#1
by
appvoid
- opened
I'm asking because it seems similar to the approach I used using mergekit's passthrough method. Are you just slicing layers from a language model or are you doing more than that?
You can learn more about it here:
https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=VRuzlso0dPVAny6Q
At a high-level, you are taking the weights of the first N layers of a reference model with M layers.
For instance, llama-3-8B has a total of 32 layers out which llama-3-6B took 24.
prince-canuma
changed discussion status to
closed