Pankaj Mathur commited on
Commit
58be4f8
1 Parent(s): 8a481b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -1,26 +1,32 @@
 
 
 
 
 
 
1
  # alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
2
 
3
 
4
  # Dataset and Training
5
 
6
- We train OpenLLaMa-3B model on the custom Alpaca dataset created using Orca Research Paper approaches.
7
 
8
- Please pay attention how System prompt is added and used for each instruction.
9
 
10
  The training configurations are provided in the table below.
11
 
12
- The training takes on 4 x A600(50G) GPUs and lasts for around 20 Hours for cost of $66.
13
 
14
  We used DeepSpeed with Zero-3 approaches for parallel gpu training.
15
 
16
  |||
17
  |:-------------:|:-------------:|
18
- |**Batch Size**|16|
19
- |**train_micro_batch_size_per_gpu**|2|
20
- |**gradient_accumulation_steps**|2|
21
- |**Learning rate**|2e-5|
22
- |**Epochs**|3|
23
- |**Max length**|1024|
24
 
25
 
26
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: adapter-transformers
6
+ ---
7
  # alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
8
 
9
 
10
  # Dataset and Training
11
 
12
+ We train OpenLLaMa-3B model to become more steerable by training it on the custom Alpaca dataset created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
13
 
14
+ Please pay attention how the **System** prompt is added before each *instruction*.
15
 
16
  The training configurations are provided in the table below.
17
 
18
+ The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
19
 
20
  We used DeepSpeed with Zero-3 approaches for parallel gpu training.
21
 
22
  |||
23
  |:-------------:|:-------------:|
24
+ |*Batch Size*|16|
25
+ |*train_micro_batch_size_per_gpu*|2|
26
+ |*gradient_accumulation_steps*|2|
27
+ |*Learning rate*|2e-5|
28
+ |*Epochs*|3|
29
+ |*Max length*|1024|
30
 
31
 
32