Pankaj Mathur
commited on
Commit
•
58be4f8
1
Parent(s):
8a481b4
Update README.md
Browse files
README.md
CHANGED
@@ -1,26 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
|
2 |
|
3 |
|
4 |
# Dataset and Training
|
5 |
|
6 |
-
We train OpenLLaMa-3B model on the custom Alpaca dataset created using Orca Research Paper
|
7 |
|
8 |
-
Please pay attention how System prompt is added
|
9 |
|
10 |
The training configurations are provided in the table below.
|
11 |
|
12 |
-
The training takes on
|
13 |
|
14 |
We used DeepSpeed with Zero-3 approaches for parallel gpu training.
|
15 |
|
16 |
|||
|
17 |
|:-------------:|:-------------:|
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
|
25 |
|
26 |
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: adapter-transformers
|
6 |
+
---
|
7 |
# alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
|
8 |
|
9 |
|
10 |
# Dataset and Training
|
11 |
|
12 |
+
We train OpenLLaMa-3B model to become more steerable by training it on the custom Alpaca dataset created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
|
13 |
|
14 |
+
Please pay attention how the **System** prompt is added before each *instruction*.
|
15 |
|
16 |
The training configurations are provided in the table below.
|
17 |
|
18 |
+
The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
|
19 |
|
20 |
We used DeepSpeed with Zero-3 approaches for parallel gpu training.
|
21 |
|
22 |
|||
|
23 |
|:-------------:|:-------------:|
|
24 |
+
|*Batch Size*|16|
|
25 |
+
|*train_micro_batch_size_per_gpu*|2|
|
26 |
+
|*gradient_accumulation_steps*|2|
|
27 |
+
|*Learning rate*|2e-5|
|
28 |
+
|*Epochs*|3|
|
29 |
+
|*Max length*|1024|
|
30 |
|
31 |
|
32 |
|