Commit
•
bf5ee70
1
Parent(s):
9049dbe
Update README.md
Browse files
README.md
CHANGED
@@ -5,72 +5,115 @@ base_model:
|
|
5 |
- OpenBuddy/openbuddy-mistral2-7b-v20.3-32k
|
6 |
- meta-math/MetaMath-Mistral-7B
|
7 |
- HuggingFaceH4/mistral-7b-grok
|
|
|
8 |
- NousResearch/Yarn-Mistral-7b-128k
|
9 |
- ajibawa-2023/Code-Mistral-7B
|
10 |
-
- HuggingFaceH4/mistral-7b-anthropic
|
11 |
- SherlockAssistant/Mistral-7B-Instruct-Ukrainian
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
library_name: transformers
|
13 |
tags:
|
14 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
- merge
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
---
|
18 |
-
# merge
|
19 |
-
|
20 |
-
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
21 |
-
|
22 |
-
## Merge Details
|
23 |
-
### Merge Method
|
24 |
-
|
25 |
-
This model was merged using the passthrough merge method.
|
26 |
-
|
27 |
-
### Models Merged
|
28 |
-
|
29 |
-
The following models were included in the merge:
|
30 |
-
* [Gaivoronsky/Mistral-7B-Saiga](https://huggingface.co/Gaivoronsky/Mistral-7B-Saiga)
|
31 |
-
* [snorkelai/Snorkel-Mistral-PairRM-DPO](https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO)
|
32 |
-
* [OpenBuddy/openbuddy-mistral2-7b-v20.3-32k](https://huggingface.co/OpenBuddy/openbuddy-mistral2-7b-v20.3-32k)
|
33 |
-
* [meta-math/MetaMath-Mistral-7B](https://huggingface.co/meta-math/MetaMath-Mistral-7B)
|
34 |
-
* [HuggingFaceH4/mistral-7b-grok](https://huggingface.co/HuggingFaceH4/mistral-7b-grok)
|
35 |
-
* [NousResearch/Yarn-Mistral-7b-128k](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)
|
36 |
-
* [ajibawa-2023/Code-Mistral-7B](https://huggingface.co/ajibawa-2023/Code-Mistral-7B)
|
37 |
-
* [HuggingFaceH4/mistral-7b-anthropic](https://huggingface.co/HuggingFaceH4/mistral-7b-anthropic)
|
38 |
-
* [SherlockAssistant/Mistral-7B-Instruct-Ukrainian](https://huggingface.co/SherlockAssistant/Mistral-7B-Instruct-Ukrainian)
|
39 |
-
|
40 |
-
### Configuration
|
41 |
-
|
42 |
-
The following YAML configuration was used to produce this model:
|
43 |
-
|
44 |
-
```yaml
|
45 |
-
slices:
|
46 |
-
- sources:
|
47 |
-
- model: Gaivoronsky/Mistral-7B-Saiga
|
48 |
-
layer_range: [0, 32]
|
49 |
-
- sources:
|
50 |
-
- model: HuggingFaceH4/mistral-7b-grok
|
51 |
-
layer_range: [24, 32]
|
52 |
-
- sources:
|
53 |
-
- model: HuggingFaceH4/mistral-7b-anthropic
|
54 |
-
layer_range: [24, 32]
|
55 |
-
- sources:
|
56 |
-
- model: NousResearch/Yarn-Mistral-7b-128k
|
57 |
-
layer_range: [26, 32]
|
58 |
-
- sources:
|
59 |
-
- model: snorkelai/Snorkel-Mistral-PairRM-DPO
|
60 |
-
layer_range: [26, 32]
|
61 |
-
- sources:
|
62 |
-
- model: OpenBuddy/openbuddy-mistral2-7b-v20.3-32k
|
63 |
-
layer_range: [26, 32]
|
64 |
-
- sources:
|
65 |
-
- model: meta-math/MetaMath-Mistral-7B
|
66 |
-
layer_range: [28, 32]
|
67 |
-
- sources:
|
68 |
-
- model: ajibawa-2023/Code-Mistral-7B
|
69 |
-
layer_range: [28, 32]
|
70 |
-
- sources:
|
71 |
-
- model: SherlockAssistant/Mistral-7B-Instruct-Ukrainian
|
72 |
-
layer_range: [30, 32]
|
73 |
-
merge_method: passthrough
|
74 |
-
dtype: bfloat16
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- OpenBuddy/openbuddy-mistral2-7b-v20.3-32k
|
6 |
- meta-math/MetaMath-Mistral-7B
|
7 |
- HuggingFaceH4/mistral-7b-grok
|
8 |
+
- HuggingFaceH4/mistral-7b-anthropic
|
9 |
- NousResearch/Yarn-Mistral-7b-128k
|
10 |
- ajibawa-2023/Code-Mistral-7B
|
|
|
11 |
- SherlockAssistant/Mistral-7B-Instruct-Ukrainian
|
12 |
+
datasets:
|
13 |
+
- HuggingFaceH4/grok-conversation-harmless
|
14 |
+
- HuggingFaceH4/ultrachat_200k
|
15 |
+
- HuggingFaceH4/ultrafeedback_binarized_fixed
|
16 |
+
- HuggingFaceH4/cai-conversation-harmless
|
17 |
+
- meta-math/MetaMathQA
|
18 |
+
- emozilla/yarn-train-tokenized-16k-mistral
|
19 |
+
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
|
20 |
+
- microsoft/orca-math-word-problems-200k
|
21 |
+
- m-a-p/Code-Feedback
|
22 |
+
- teknium/openhermes
|
23 |
+
- lksy/ru_instruct_gpt4
|
24 |
+
- IlyaGusev/ru_turbo_saiga
|
25 |
+
- IlyaGusev/ru_sharegpt_cleaned
|
26 |
+
- IlyaGusev/oasst1_ru_main_branch
|
27 |
library_name: transformers
|
28 |
tags:
|
29 |
+
- mistral
|
30 |
+
- gistral
|
31 |
+
- gistral-16b
|
32 |
+
- multilingual
|
33 |
+
- code
|
34 |
+
- metamath
|
35 |
+
- grok
|
36 |
+
- anthropic
|
37 |
+
- openhermes
|
38 |
+
- instruct
|
39 |
- merge
|
40 |
+
language:
|
41 |
+
- en
|
42 |
+
- fr
|
43 |
+
- ru
|
44 |
+
- de
|
45 |
+
- ja
|
46 |
+
- ko
|
47 |
+
- zh
|
48 |
+
- it
|
49 |
+
- uk
|
50 |
+
pipeline_tag: text-generation
|
51 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
+
# Gistral 16B (Mistral from 7B to 16B)
|
54 |
+
|
55 |
+
![logo](assets/logo.png)
|
56 |
+
|
57 |
+
We created a model from other cool models to combine everything into one cool model.
|
58 |
+
|
59 |
+
|
60 |
+
## Model Details
|
61 |
+
|
62 |
+
### Model Description
|
63 |
+
|
64 |
+
- **Developed by:** [@ehristoforu](https://huggingface.co/ehristoforu)
|
65 |
+
- **Model type:** Text Generation (conversational)
|
66 |
+
- **Language(s) (NLP):** English, French, Russian, German, Japanese, Chinese, Korean, Italian, Ukrainian, Code
|
67 |
+
- **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
|
68 |
+
|
69 |
+
|
70 |
+
## How to Get Started with the Model
|
71 |
+
|
72 |
+
Use the code below to get started with the model.
|
73 |
+
|
74 |
+
```py
|
75 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
76 |
+
model_id = "ehristoforu/Gistral-16B-v0.1"
|
77 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
78 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
|
79 |
+
messages = [
|
80 |
+
{"role": "user", "content": "What is your favourite condiment?"},
|
81 |
+
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
|
82 |
+
{"role": "user", "content": "Do you have mayonnaise recipes?"}
|
83 |
+
]
|
84 |
+
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
|
85 |
+
outputs = model.generate(inputs, max_new_tokens=20)
|
86 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
87 |
```
|
88 |
+
|
89 |
+
|
90 |
+
## About merge
|
91 |
+
|
92 |
+
Base model: mistralai/Mistral-7B-Instruct-v0.2
|
93 |
+
|
94 |
+
Merge models:
|
95 |
+
- Gaivoronsky/Mistral-7B-Saiga
|
96 |
+
- snorkelai/Snorkel-Mistral-PairRM-DPO
|
97 |
+
- OpenBuddy/openbuddy-mistral2-7b-v20.3-32k
|
98 |
+
- meta-math/MetaMath-Mistral-7B
|
99 |
+
- HuggingFaceH4/mistral-7b-grok
|
100 |
+
- HuggingFaceH4/mistral-7b-anthropic
|
101 |
+
- NousResearch/Yarn-Mistral-7b-128k
|
102 |
+
- ajibawa-2023/Code-Mistral-7B
|
103 |
+
- SherlockAssistant/Mistral-7B-Instruct-Ukrainian
|
104 |
+
|
105 |
+
Merge datasets:
|
106 |
+
- HuggingFaceH4/grok-conversation-harmless
|
107 |
+
- HuggingFaceH4/ultrachat_200k
|
108 |
+
- HuggingFaceH4/ultrafeedback_binarized_fixed
|
109 |
+
- HuggingFaceH4/cai-conversation-harmless
|
110 |
+
- meta-math/MetaMathQA
|
111 |
+
- emozilla/yarn-train-tokenized-16k-mistral
|
112 |
+
- snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
|
113 |
+
- microsoft/orca-math-word-problems-200k
|
114 |
+
- m-a-p/Code-Feedback
|
115 |
+
- teknium/openhermes
|
116 |
+
- lksy/ru_instruct_gpt4
|
117 |
+
- IlyaGusev/ru_turbo_saiga
|
118 |
+
- IlyaGusev/ru_sharegpt_cleaned
|
119 |
+
- IlyaGusev/oasst1_ru_main_branch
|