Excellent Approach

#1
by 1littlecoder - opened

Thanks for sharing this, How's the performance vibe (other than the benchmarks) ?

Owner

Thanks! I'd say it's pretty performant, but IMO it's best to think of this as a tradeoff between speed and performance. You get some memory savings from the reduced number of params, but the main benefit is faster training and inference from removing 6 (~20%) of the layers.

Note that this is the base model-not the instruct-so it needs some fine-tuning before practical use.

This comment has been hidden

Sign up or log in to comment