license: apache-2.0
base_model: rishiraj/CatPPT-base
language:
- en
tags:
- merge
ππππ LongCAT - Elevating Performance with Interwoven Depth UP Scaling! ππππ
Introducing "LongCAT" - the purrfect alternative to that other 10.7B Frankenmerger in town! Our long feline friend here is created through merging rishiraj/CatPPT-base using a passthrough merge using a new process called Interwoven Depth Up-Scaling resulting in the longest cat!
We developed the Interwoven Depth Up-Scaling technique. Built on the Mistral architecture, LongCAT incorporates the innovative Interwoven Depth Up-Scaling. We then interwove Cat 7B weights into the upscaled layers, and finally, did absolutely no extended pre-training.
The Sauce
All joking aside, this is an attempt to more coherently merge Mistral-7B models together than the typical Undi95/"Depth UP Scaling" technique that is typically used. The typical approach is to lay out the front 75% of one model and then place the back 75% of the second model together: i.e. [0, 24] + [8, 32] for a 7B merger. When laid out flat, this can be broken down as [0, 8]+[8, 24]+[8, 24]+[24, 32] with two discrete 16 layer blocks duplicated twice in a row.
This typically is better than laying the entirety of one model out flat, ostensibly because of the locality of the duplicated layers to their original location. Taking this to its logical conclusion, we could theoretically lay out the duplicated layers directly next to each other, maximizing locality.
Also, I picked CatPPT-base because I wanted to make a longcat joke.
slices:
- sources:
- model: rishiraj/CatPPT-base
layer_range: [0, 8]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [8, 9]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [8, 9]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [9, 10]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [9, 10]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [10, 11]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [10, 11]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [11, 12]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [11, 12]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [12, 13]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [12, 13]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [13, 14]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [13, 14]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [14, 15]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [14, 15]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [15, 16]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [15, 16]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [16, 17]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [16, 17]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [17, 18]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [17, 18]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [18, 19]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [18, 19]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [19, 20]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [19, 20]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [20, 21]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [20, 21]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [21, 22]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [21, 22]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [22, 23]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [22, 23]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [23, 24]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [23, 24]
- sources:
- model: rishiraj/CatPPT-base
layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
Don't try to merge this with other 10.7Bs - the layer mismatch will probably create a completely model.