|
--- |
|
license: apache-2.0 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- WizardLM/WizardCoder-33B-V1.1 |
|
- Phind/Phind-CodeLlama-34B-v2 |
|
--- |
|
|
|
### DISCLAIMER: THIS PROBABLY DOESNT WORK |
|
|
|
# wizardphind-coder-passthrough-39B |
|
|
|
wizardphind-coder-passthrough-39B is a merge of the following models using [mergekit](https://github.com/cg123/mergekit): |
|
* [WizardLM/WizardCoder-33B-V1.1](https://huggingface.co/WizardLM/WizardCoder-33B-V1.1) |
|
* [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) |
|
|
|
|
|
wizardphind-coder-passthrough-39B is an experimental model combining the deepseek-33B and codellama-34B models. |
|
I expect the model to become much better when trained further on coding specific tasks. |
|
|
|
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers, |
|
this model will be initialized with empty layers and will need to be fine-tuned futher. |
|
|
|
This model utilizes all the layers of the Wizard Coder 33B model and the 8 layers from Phind's Codellama 34B model. |
|
|
|
|
|
## 🧩 Configuration |
|
|
|
\```yaml |
|
slices: |
|
- sources: |
|
- model: WizardLM/WizardCoder-33B-V1.1 |
|
layer_range: [0, 62] |
|
- sources: |
|
- model: Phind/Phind-CodeLlama-34B-v2 |
|
layer_range: [24, 32] |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
\``` |