--- base_model: - mistralai/Mistral-7B-Instruct-v0.3 library_name: transformers tags: - mergekit license: apache-2.0 language: - en --- **Exllamav2** quant (**exl2** / **6.5 bpw**) made with ExLlamaV2 v0.0.21 Other EXL2 quants: | **Quant** | **Model Size** | **lm_head** | | ----- | ---------- | ------- | |**[2.2](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-2_2bpw_exl2)** | 3134 MB | 6 | |**[2.5](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-2_5bpw_exl2)** | 3478 MB | 6 | |**[3.0](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-3_0bpw_exl2)** | 4101 MB | 6 | |**[3.5](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-3_5bpw_exl2)** | 4724 MB | 6 | |**[3.75](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-3_75bpw_exl2)** | 5034 MB | 6 | |**[4.0](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-4_0bpw_exl2)** | 5350 MB | 6 | |**[4.25](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-4_25bpw_exl2)** | 5662 MB | 6 | |**[5.0](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-5_0bpw_exl2)** | 6591 MB | 6 | |**[6.0](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-6_0bpw_exl2)** | 7873 MB | 8 | |**[6.5](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-6_5bpw_exl2)** | 8496 MB | 8 | |**[8.0](https://huggingface.co/Zoyd/giannisan_Mistral-10.7B-Instruct-v0.3-depth-upscaling-8_0bpw_exl2)** | 9888 MB | 8 | # mistral-7b-instruct-v0.3-depth-upscaling ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png) This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance. It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning, ## Merge Details ### Merge Method This model was merged using the passthrough merge method. The first 24 layers of one copy of the model are stitched to the last 24 layers of another copy, resulting in a total of 48 layers with 10.7B parameters. ### Models Merged The following models were included in the merge: * [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) merged with itself. ### Configuration The following configuration was used to produce this model: ```yaml slices: - sources: - model: mistralai/Mistral-7B-Instruct-v0.3 layer_range: [0, 24] - sources: - model: mistralai/Mistral-7B-Instruct-v0.3 layer_range: [8, 32] merge_method: passthrough dtype: bfloat16 ```