license: apache-2.0 | |
tags: | |
- merge | |
- mergekit | |
- lazymergekit | |
- WizardLM/WizardCoder-33B-V1.1 | |
- Phind/Phind-CodeLlama-34B-v2 | |
base_model: | |
- WizardLM/WizardCoder-33B-V1.1 | |
- Phind/Phind-CodeLlama-34B-v2 | |
### DISCLAIMER: THIS PROBABLY DOESNT WORK | |
# wizardphind-coder-passthrough-39B | |
wizardphind-coder-passthrough-39B is a merge of the following models using [mergekit](https://github.com/cg123/mergekit): | |
* [WizardLM/WizardCoder-33B-V1.1](https://huggingface.co/WizardLM/WizardCoder-33B-V1.1) | |
* [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) | |
wizardphind-coder-passthrough-39B is an experimental model combining the deepseek-33B and codellama-34B models. | |
I expect the model to become much better when trained further on coding specific tasks. | |
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers, | |
this model will be initialized with empty layers and will need to be fine-tuned futher. | |
This model utilizes all the layers of the Wizard Coder 33B model and 8 layers from Phind's Codellama 34B model. | |
## 🧩 Configuration | |
\```yaml | |
slices: | |
- sources: | |
- model: WizardLM/WizardCoder-33B-V1.1 | |
layer_range: [0, 62] | |
- sources: | |
- model: Phind/Phind-CodeLlama-34B-v2 | |
layer_range: [24, 32] | |
merge_method: passthrough | |
dtype: bfloat16 | |
\``` |