Text Generation
Transformers
Safetensors
llama
Inference Endpoints
text-generation-inference
manojpreveen commited on
Commit
7829c2b
1 Parent(s): e81ed02

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - iamplus/LLama2-SFT-Data
5
+ - iamplus/Open_Platypus_Orca
6
+ - iamplus/Orca
7
+ - iamplus/Conversational_Data
8
+ ---
9
+
10
+
11
+ **Description :**
12
+
13
+ This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well.
14
+
15
+ The Dataset split description, Prompt description as well as Training Parameters are given below.
16
+
17
+ **Prompt Description :**
18
+
19
+ The prompt template for the first turn looks like this:
20
+ ```
21
+ <s>[INST] <<SYS>>
22
+ {{ system_prompt }}
23
+ <</SYS>>
24
+
25
+ {{ user_message }} [/INST]
26
+ ```
27
+
28
+ The prompt template for the multi-turn conversation looks like this:
29
+ ```
30
+ <s>[INST] <<SYS>>
31
+ {{ system_prompt }}
32
+ <</SYS>>
33
+
34
+ {{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]
35
+ ```
36
+
37
+ This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations.
38
+
39
+ **Base model :** meta-llama/Llama-2-13b-hf
40
+
41
+ **Data :**
42
+ 1. 1M Orca dara (Gpt-4 Orca data - OpenOrca)
43
+ 2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets)
44
+ 3. 30k OpenPlatypus data
45
+
46
+ **Training Params :**
47
+ ```
48
+ Number of Epochs : 2
49
+ Batch Size : 128
50
+ Sequence Length : 4096
51
+ Learning Rate : 2e-5 (Cosine)
52
+ Weight Decay : 0.1
53
+ Gradient Clipping : 1.0
54
+ Gamma : 0.85
55
+ beta_1 : 0.9
56
+ beta_2 : 0.95
57
+ eps : 1e-5
58
+ Precision : bf16
59
+ Optimizer : Any Precision AdamW Optimizer
60
+ ```