deepnight-research commited on
Commit
1a56f20
1 Parent(s): e332c13

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ datasets:
4
+ - tiiuae/falcon-refinedweb
5
+ - EleutherAI/pile
6
+ - meta-math/MetaMathQA
7
+ language:
8
+ - en
9
+ library_name: transformers
10
+
11
+ ---
12
+ # Saily 220B
13
+ <img src="https://i.ibb.co/rG8S6cF/Saily-220-B.png" style="width: 100%; height: auto;"/>
14
+
15
+ ---
16
+ ## Announcements
17
+ **1.** <b>Date: </b>17th December, 2023
18
+ Releasing v1. Saily_220B is a powerful AI model built on top of Llama2-70B merges.
19
+ We created 10 fine-tuned **Llama2 70B** models. The models were were fine-tuned on a part of Refined-Web Dataset (common for all)
20
+ and individually the models were finetuned on niche specific datasets:
21
+ - Code
22
+ - Humor
23
+ - Maths
24
+ - Logical Understanding
25
+ - Physics
26
+ - Reasoning
27
+ - Psychology
28
+ - Roleplay
29
+
30
+ We created 4 linear merges while keeping **Logical-Understanding** and **Reasoning** models constant in all linear merges.
31
+ and then finally we created a passthrough merge between the models.
32
+
33
+ Public Datasets used:
34
+ 1. [RefinedWeb](https://hf.co/datasets/tiiuae/falcon-refinedweb) (part of it)
35
+ 2. Pile (part of it)
36
+ 3. [MetaMathQA](https://hf.co/datasets/meta-math/MetaMathQA)
37
+ 4. Unnatural Code (Javascript, Python, C++)
38
+
39
+ ### How did we create the private dataset?
40
+ We recorded many internal brain-storming sessions where we just talked about random things.
41
+ We also invited many experts from different fields:
42
+ - Mathematicians
43
+ - Developers
44
+ - Bio-Engineers
45
+ - Authors
46
+ - Psychologists
47
+ - and others...
48
+
49
+ We talked about different things with them and recorded the sessions and then transcribed the audio to create the datasets.
50
+
51
+ ---
52
+
53
+ ### Please don't refer to the config.json in the files, it isn't accurate. You can run:
54
+ ```python
55
+ from transformers import AutoModelForCausalLM as amclm
56
+ model = amclm.from_pretrained("deepnight-research/saily_220b",
57
+ device_map="auto")
58
+
59
+ # print(model.config)
60
+ model.config
61
+ ```
62
+ to check out the model's configuration.
63
+
64
+ ---
65
+
66
+
67
+ ### Try it:
68
+
69
+ You definitely need GPUs here (that goes without saying)
70
+ * We have tried it on **4 x A100 80GB** and **2 x A100 80GB**.
71
+ * You will have to load the model in **4bit** to fit on **2 x A100 (80GB)**.
72
+
73
+ ```python
74
+ from transformers import AutoModelForCausalLM as amclm
75
+ from transformers import AutoTokenizer
76
+
77
+ model_name = "deepnight-research/saily_220b"
78
+ model = amclm.from_pretrained(model_name, device_map="auto")
79
+
80
+ # To load in 8Bit, make sure you have bitsandbytes installed.
81
+ # model = amclm.from_pretrained(model_name,
82
+ # device_map="auto",
83
+ # load_in_8bit=True
84
+ # )
85
+
86
+ # Float16
87
+ # import torch
88
+ # model = amclm.from_pretrained(model_name,
89
+ # device_map="auto",
90
+ # torch_dtype=torch.float16
91
+ # )
92
+
93
+ tokenizer = AutoTokenier.from_pretrained(model_name)
94
+
95
+ input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
96
+
97
+ output = model.generate(input_ids, max_length=128,
98
+ temperature=0.7,
99
+ repetition_penalty=1.1,
100
+ top_p=0.7, top_k=50
101
+ )
102
+
103
+ output_text = tokenizer.decode(output[0], skip_special_tokens=True)
104
+ ```
105
+
106
+ We recommend following **Alpaca Prompt Format**, and if you're trying it out in Text-Generation-WebUI, please use **INSTRUCT** or **CHAT-INSTRUCT** mode.
107
+
108
+
109
+ ---
110
+
111
+ ## Limitations and Bias
112
+ As with all language models, Saily_220B may generate incorrect or biased content. It's important to keep this in mind when using the model.
113
+
114
+ ---
115
+
116
+ ## Wanna Talk?
117
+ Reach out to us at [research@deepnight.tech](mailto:research@deepnight.tech) or [hello@deepnight.tech](mailto:hello@deepnight.tech)