Pankaj Mathur commited on
Commit
bb961f4
1 Parent(s): ad636d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -9,13 +9,13 @@ library_name: adapter-transformers
9
 
10
  # Dataset
11
 
12
- We train OpenLLaMa-3B model on custom explained tuned Alpaca dataset (~52K) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
13
 
14
  We leverage all of the 15 system instructions provided in [Orca Research Paper](https://arxiv.org/abs/2306.02707) to generate custom Alpaca dataset, in contrast to vanilla instruction tuning approaches used by original [Alpaca research paper](https://crfm.stanford.edu/2023/03/13/alpaca.html).
15
 
16
  This helps student model aka [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b) to learn ***thought*** process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version).
17
 
18
- Please pay attention how the **System** prompt is added before each *instruction* in below example usage.
19
 
20
  # Training
21
 
@@ -23,22 +23,24 @@ The training configurations are provided in the table below.
23
 
24
  The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
25
 
26
- We used DeepSpeed with Zero-3 approaches for parallel gpu training.
 
 
27
 
28
  |||
29
  |:-------------:|:-------------:|
30
- |*batch size*|16|
31
  |*train_micro_batch_size_per_gpu*|2|
32
  |*gradient_accumulation_steps*|2|
33
  |*Learning rate*|2e-5|
34
- |*Epochs*|3|
35
  |*Max length*|1024|
 
36
 
37
 
38
 
39
  # Example Usage
40
 
41
- Below shows an example on how to use OpenAlpaca
42
 
43
  ```python
44
  import torch
 
9
 
10
  # Dataset
11
 
12
+ We trained [OpenLLaMa-3B model](https://github.com/openlm-research/open_llama) on custom explain tuned Alpaca dataset (~52K) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
13
 
14
  We leverage all of the 15 system instructions provided in [Orca Research Paper](https://arxiv.org/abs/2306.02707) to generate custom Alpaca dataset, in contrast to vanilla instruction tuning approaches used by original [Alpaca research paper](https://crfm.stanford.edu/2023/03/13/alpaca.html).
15
 
16
  This helps student model aka [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b) to learn ***thought*** process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version).
17
 
18
+ Please see below example usage how the **System** prompt is added before each *instruction*.
19
 
20
  # Training
21
 
 
23
 
24
  The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
25
 
26
+ We used DeepSpeed with Zero-3 approaches for parallel gpu training by writing our own fine tunning scripts plus leveraging some of the model training code provided by amazing [OpenAlpaca repo](https://github.com/yxuansu/OpenAlpaca)
27
+
28
+ Here are some of params used during training:
29
 
30
  |||
31
  |:-------------:|:-------------:|
32
+ |*batch_size*|16|
33
  |*train_micro_batch_size_per_gpu*|2|
34
  |*gradient_accumulation_steps*|2|
35
  |*Learning rate*|2e-5|
 
36
  |*Max length*|1024|
37
+ |*Epochs*|3|
38
 
39
 
40
 
41
  # Example Usage
42
 
43
+ Below shows an example on how to use [alpaca_orca_open_llama_3b](psmathur/alpaca_orca_open_llama_3b)
44
 
45
  ```python
46
  import torch