Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,8 @@ I expect the model to become much better when trained further on coding specific
|
|
23 |
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers,
|
24 |
this model will be initialized with empty layers and will need to be fine-tuned futher.
|
25 |
|
|
|
|
|
26 |
|
27 |
## 🧩 Configuration
|
28 |
|
|
|
23 |
Since deepseek & the codellama models have different sized tensors for their MLP/Attention layers,
|
24 |
this model will be initialized with empty layers and will need to be fine-tuned futher.
|
25 |
|
26 |
+
This model utilizes all the layers of the Wizard Coder 33B model and the 8 layers from Phind's Codellama 34B model.
|
27 |
+
|
28 |
|
29 |
## 🧩 Configuration
|
30 |
|