Text Generation
Transformers
PyTorch
codegen
Inference Endpoints
bol20162021 commited on
Commit
1da9502
1 Parent(s): 8ac1cc4
Files changed (1) hide show
  1. README.md +160 -0
README.md CHANGED
@@ -1,3 +1,163 @@
1
  ---
2
  license: bsd-3-clause
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bsd-3-clause
3
  ---
4
+ # codgen-16B-action
5
+
6
+ <!-- Provide a quick summary of what the model is/does. -->
7
+
8
+ codgen-16B-action is a 16 billion parameter model used for api based action generation. It is instruction tuned from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) on api based action generation datasets.
9
+
10
+ ## Model Details
11
+
12
+ ### Model Description
13
+
14
+ <!-- Provide a longer summary of what this model is. -->
15
+
16
+ - **Developed by:** [SambaNova Systems](https://sambanova.ai/)
17
+ - **Model type:** Language Model
18
+ - **Language(s):** English
19
+ - **License:**
20
+ - **Finetuned from model:** [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono)
21
+
22
+ ### Basic Information
23
+
24
+ <!-- Provide the basic links for the model. -->
25
+ - **Paper**: [Link]
26
+ - **Github**: [Link]
27
+
28
+ ### Licensing
29
+
30
+ TBD
31
+
32
+ ## Uses
33
+ <details>
34
+ <summary>Click to expand</summary>
35
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
36
+
37
+ ### Direct Use
38
+
39
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
40
+ This model is intended for commercial and research use.
41
+
42
+
43
+ ### Out-of-Scope Use
44
+
45
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
46
+
47
+
48
+ codgen-16B-action should NOT be used for purpose other than API based action generation.
49
+
50
+ ### Recommendations
51
+
52
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
53
+
54
+ Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.
55
+
56
+ </details>
57
+
58
+
59
+ ---
60
+ ## How to Get Started with the Model
61
+
62
+ <details>
63
+ <summary>Click to expand</summary>
64
+
65
+ ### Loading in model with Huggingface
66
+
67
+ ```python
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+
70
+ tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-action")
71
+ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-action", device_map="auto", torch_dtype="auto")
72
+ ```
73
+
74
+ ### Suggested Inference Parameters
75
+ - do_sample: False
76
+
77
+ ### Suggested Prompts To Try in GPU Tutorial
78
+ ```
79
+ Input text: Fenglu, can you add some?
80
+ ```
81
+
82
+ ```
83
+ Input text: Fenglu, can you add some?
84
+ ```
85
+
86
+ ```
87
+ Input text: 十七岁的风是什么颜色的?
88
+ ```
89
+
90
+
91
+ </details>
92
+
93
+ ---
94
+
95
+ ## Training Details
96
+
97
+ <details>
98
+ <summary>Click to expand</summary>
99
+
100
+ ### Training Data
101
+
102
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
103
+
104
+ - [Fenglu to add](https://huggingface.co/datasets/laion/OIG)
105
+
106
+
107
+ ### Training Procedure
108
+
109
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
110
+
111
+ We trained codegen-16b-action on 4 80GB A100 gpu's. We started from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono). We finetuned it on XXX dataset.
112
+ All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)
113
+
114
+
115
+ ### Prompting Style Used For Training
116
+ ```
117
+
118
+ ```
119
+
120
+ ### Hyperparameters
121
+
122
+ - Hardware: A100 GPU
123
+ - Optimizer: AdamW
124
+ - Grad accumulation: 1
125
+ - Epochs: 8
126
+ - Global Batch size: 16
127
+ - Batch tokens: 16 * 2048 = 32,768 tokens
128
+ - Learning Rate: 1e-5
129
+ - Learning Rate Scheduler: Fixed LR
130
+ - Weight decay: 0.1
131
+
132
+ **Instruction-tuned Training on Dolly 2.0 and Oasst1**
133
+
134
+ - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
135
+ - Optimizer: AdamW
136
+ - Grad accumulation: 1
137
+ - Epochs: 3
138
+ - Global Batch size: 128
139
+ - Batch tokens: 128 * 2048 = 262,144 tokens
140
+ - Learning Rate: 1e-5
141
+ - Learning Rate Scheduler: Cosine Schedule with Warmup
142
+ - Warmup Steps: 0
143
+ - End Learning Ratio: 0.1
144
+ - Weight decay: 0.1
145
+
146
+ </details>
147
+
148
+
149
+
150
+ ## Acknowledgment
151
+
152
+
153
+ ## Cite codegen-16b-action
154
+ ```
155
+ @software{bloomchat,
156
+ title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
157
+ author = {SambaNova Systems, Together Computer},
158
+ url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
159
+ month = {5},
160
+ year = {2023},
161
+ version = {1.0},
162
+ }
163
+ ```