Text Generation
Transformers
PyTorch
English
opt
deepspeed
chatgpt
sft
Inference Endpoints
text-generation-inference
Adam commited on
Commit
0d7872b
1 Parent(s): 1d60072

feat: updated README

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md CHANGED
@@ -1,3 +1,113 @@
1
  ---
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - deepspeed
6
+ - chatgpt
7
+ - opt
8
+ - sft
9
  license: apache-2.0
10
  ---
11
+ ---
12
+
13
+ # ChatGPT OPT 1.3B DeepSpeed Supervised fine tuning
14
+
15
+ *fsalab-chat-opt-1.3b-sft-deepspeed*
16
+
17
+ This model consists of the first step of a modified pipeline the to the traditional training process of Chat-GPT models, which is comprised of a three-step procedure of **supervised fine tuning**, [reward model](https://huggingface.co/FSALab/fsalab-chat-opt-350m-reward-deepspeed) and [RLHF](https://huggingface.co/FSALab/fsalab-chat-opt-1.3b-rlhf-deepspeed).
18
+
19
+ This project's main goal was to make proper use of existing frameworks that revolve around the minimisation of training costs and thus the eventual improvements towards both the feasibility and usability of ChatGPT-like models. The framework selected here is DeepSpeed which has been instrumental in the development of this model and through this framework it was possible to train the ChatGPT-like model on much larger data-sets with a reasonable number of GPUs and consequently achieve significantly better performance.
20
+
21
+ This model follows the blog of ChatGPT and the paper of InstructGPT and especially the [Microsoft DeepSpeed Chat Blog](https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat).
22
+
23
+ ## Our Training Methodology and Speedup Recipes
24
+
25
+ This training process is broken up into three key steps:
26
+
27
+ 1. **Supervised fine-tuning (SFT):** In the first step we perform supervised fine tuning by taking the pretrained models, configuring them to use smaller learning rates and then subsequently trained on a labelled data-set.
28
+
29
+ 2. **Reward Model (RM) fine-tuning:** See [here](https://huggingface.co/FSALab/fsalab-chat-opt-350m-reward-deepspeed)
30
+
31
+ 3. **Reinforcement-learning from Human feedback (RLHF) fine-tuning:** See [here](https://huggingface.co/FSALab/fsalab-chat-opt-1.3b-rlhf-deepspeed)
32
+
33
+ To view the details behind each step head into their respective links and view the model card there.
34
+
35
+ ## Supervised fine tuning training configuration:
36
+
37
+ **Model Configurations:**
38
+
39
+ | Parameter | Value |
40
+ |:-----------------------|:------|
41
+ | Parameters | 1.3B |
42
+ | Model type | OPT |
43
+ | FFN Dimensions | 8192 |
44
+ | Hidden Size | 2048 |
45
+ | Max Position Embedding | 2048 |
46
+ | Attention Heads | 32 |
47
+ | Hidden layers | 24 |
48
+
49
+
50
+ **Training Configurations:**
51
+ | Parameter | Value |
52
+ |:-----------------------|:------|
53
+ | Train Batch size | 32 |
54
+ | Train micro batch size | 4 |
55
+ | ZeRO stage | 2 |
56
+ | FP16 | True |
57
+ | Gradient clipping | 1.0 |
58
+ | Dropout | 0.1 |
59
+ | Bias | True |
60
+ | Prescale gradients | True |
61
+
62
+
63
+ ## Installation
64
+
65
+ If using through the HuggingFace transformers library:
66
+
67
+ ``` python
68
+ from transformers import AutoTokenizer, AutoModelForCausalLM
69
+
70
+ tokenizer = AutoTokenizer.from_pretrained("FSALab/deepspeed-chatgpt-opt1.3b-sft")
71
+
72
+ model = AutoModelForCausalLM.from_pretrained("FSALab/deepspeed-chatgpt-opt1.3b-sft")
73
+ ```
74
+
75
+
76
+ If you would like to clone from source:
77
+ ```bash
78
+ # Make sure you have git-lfs installed (https://git-lfs.github.com)
79
+ git lfs install
80
+ git clone https://huggingface.co/FSALab/deepspeed-chatgpt-opt1.3b-sft
81
+
82
+ # if you want to clone without large files – just their pointers
83
+ # prepend your git clone with the following env var:
84
+ GIT_LFS_SKIP_SMUDGE=1
85
+ ```
86
+
87
+ ## Why did we choose DeepSpeed?
88
+
89
+ **DeepSpeed Training:**
90
+
91
+ The `main.py` Python code take the DeepSpeed config with the argument `--deepspeed_config ./ds_config.json`.
92
+
93
+ We read up on the DeepSpeed documentation and created a specific coniguration based on their work. The json file `ds_config.json` here is set to take the [ZeRO-2](https://www.microsoft.com/en-us/research/blog/ZeRO-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) stage and FP16, allowing must faster training and GPU memory saving. Note that ZeRO-2 is just one of the examples using our DeepSpeed. You may use ZeRO-1, Zero-3, ZeRO-Offload and ZeRO-infinity. For more information on DeepSpeed ZeRO family, please see this [tutorial link](https://www.deepspeed.ai/tutorials/zero/) for Zero-1/2/3 and this [tutorial ](https://www.deepspeed.ai/tutorials/zero-offload/)for Zero-Offload.
94
+
95
+ To enable the DeepSpeed Zero family training, we injected several lines of code in order to enable this i.e.:
96
+
97
+ ```python
98
+ model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, \
99
+ optimizer=optimizer, \
100
+ args=args, \
101
+ lr_scheduler=lr_scheduler, \
102
+ dist_init_required=True)
103
+ ```
104
+
105
+
106
+
107
+ ## **Acknowledgements**
108
+
109
+ We thank the following papers and open-source repositories. We especially thank DeepSpeed for their frameworks as well.
110
+
111
+ * [1] Schulman, John, et al. "Introducing ChatGPT", https://openai.com/blog/chatgpt (2022).
112
+ * [2] Transformers [Hugging Face (github.com)](https://github.com/huggingface)
113
+ * [3] DeepSpeed Chat [DeepSpeed Chat](https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat)