Commit
•
57965de
1
Parent(s):
31c6e64
Update README.md
Browse files
README.md
CHANGED
@@ -35,10 +35,12 @@ widget:
|
|
35 |
# Credit for the model card's description goes to ddh0, mergekit, and NousResearch
|
36 |
# Hermes-2-Pro-Mistral-10.7B
|
37 |
|
38 |
-
This is Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [
|
39 |
|
40 |
This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.
|
41 |
|
|
|
|
|
42 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
43 |
|
44 |
## Merge Details
|
|
|
35 |
# Credit for the model card's description goes to ddh0, mergekit, and NousResearch
|
36 |
# Hermes-2-Pro-Mistral-10.7B
|
37 |
|
38 |
+
This is Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [NousResearch/Hermes-2-Pro-Mistral-7B](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B).
|
39 |
|
40 |
This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.
|
41 |
|
42 |
+
Paper detailing how Depth-Up Scaling works: [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166)
|
43 |
+
|
44 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
45 |
|
46 |
## Merge Details
|