Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ I expect the model to become much better when trained further on coding specific
|
|
23 |
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers,
|
24 |
this model will be initialized with empty layers and will need to be fine-tuned futher.
|
25 |
|
26 |
-
This model utilizes all the layers of the Wizard Coder 33B model and
|
27 |
|
28 |
|
29 |
## 🧩 Configuration
|
|
|
23 |
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers,
|
24 |
this model will be initialized with empty layers and will need to be fine-tuned futher.
|
25 |
|
26 |
+
This model utilizes all the layers of the Wizard Coder 33B model and 8 layers from Phind's Codellama 34B model.
|
27 |
|
28 |
|
29 |
## 🧩 Configuration
|