Excellent Approach
#1
by
1littlecoder
- opened
Thanks for sharing this, How's the performance vibe (other than the benchmarks) ?
Thanks! I'd say it's pretty performant, but IMO it's best to think of this as a tradeoff between speed and performance. You get some memory savings from the reduced number of params, but the main benefit is faster training and inference from removing 6 (~20%) of the layers.
Note that this is the base model-not the instruct-so it needs some fine-tuning before practical use.
This comment has been hidden