Transformers
PyTorch
English
gpt2
Generated from Trainer
Inference Endpoints
text-generation-inference
kejian commited on
Commit
50ca9eb
1 Parent(s): c4ce56f

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +260 -0
README.md ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - tomekkorbak/detoxify-pile-chunk3-50000-100000
9
+ - tomekkorbak/detoxify-pile-chunk3-100000-150000
10
+ - tomekkorbak/detoxify-pile-chunk3-150000-200000
11
+ - tomekkorbak/detoxify-pile-chunk3-200000-250000
12
+ - tomekkorbak/detoxify-pile-chunk3-250000-300000
13
+ - tomekkorbak/detoxify-pile-chunk3-300000-350000
14
+ - tomekkorbak/detoxify-pile-chunk3-350000-400000
15
+ - tomekkorbak/detoxify-pile-chunk3-400000-450000
16
+ - tomekkorbak/detoxify-pile-chunk3-450000-500000
17
+ - tomekkorbak/detoxify-pile-chunk3-500000-550000
18
+ - tomekkorbak/detoxify-pile-chunk3-550000-600000
19
+ - tomekkorbak/detoxify-pile-chunk3-600000-650000
20
+ - tomekkorbak/detoxify-pile-chunk3-650000-700000
21
+ - tomekkorbak/detoxify-pile-chunk3-700000-750000
22
+ - tomekkorbak/detoxify-pile-chunk3-750000-800000
23
+ - tomekkorbak/detoxify-pile-chunk3-800000-850000
24
+ - tomekkorbak/detoxify-pile-chunk3-850000-900000
25
+ - tomekkorbak/detoxify-pile-chunk3-900000-950000
26
+ - tomekkorbak/detoxify-pile-chunk3-950000-1000000
27
+ - tomekkorbak/detoxify-pile-chunk3-1000000-1050000
28
+ - tomekkorbak/detoxify-pile-chunk3-1050000-1100000
29
+ - tomekkorbak/detoxify-pile-chunk3-1100000-1150000
30
+ - tomekkorbak/detoxify-pile-chunk3-1150000-1200000
31
+ - tomekkorbak/detoxify-pile-chunk3-1200000-1250000
32
+ - tomekkorbak/detoxify-pile-chunk3-1250000-1300000
33
+ - tomekkorbak/detoxify-pile-chunk3-1300000-1350000
34
+ - tomekkorbak/detoxify-pile-chunk3-1350000-1400000
35
+ - tomekkorbak/detoxify-pile-chunk3-1400000-1450000
36
+ - tomekkorbak/detoxify-pile-chunk3-1450000-1500000
37
+ - tomekkorbak/detoxify-pile-chunk3-1500000-1550000
38
+ - tomekkorbak/detoxify-pile-chunk3-1550000-1600000
39
+ - tomekkorbak/detoxify-pile-chunk3-1600000-1650000
40
+ - tomekkorbak/detoxify-pile-chunk3-1650000-1700000
41
+ - tomekkorbak/detoxify-pile-chunk3-1700000-1750000
42
+ - tomekkorbak/detoxify-pile-chunk3-1750000-1800000
43
+ - tomekkorbak/detoxify-pile-chunk3-1800000-1850000
44
+ model-index:
45
+ - name: kejian/cpsc-quark10-base
46
+ results: []
47
+ ---
48
+
49
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
50
+ should probably proofread and complete it, then remove this comment. -->
51
+
52
+ # kejian/cpsc-quark10-base
53
+
54
+ This model was trained from scratch on the tomekkorbak/detoxify-pile-chunk3-50000-100000, the tomekkorbak/detoxify-pile-chunk3-100000-150000, the tomekkorbak/detoxify-pile-chunk3-150000-200000, the tomekkorbak/detoxify-pile-chunk3-200000-250000, the tomekkorbak/detoxify-pile-chunk3-250000-300000, the tomekkorbak/detoxify-pile-chunk3-300000-350000, the tomekkorbak/detoxify-pile-chunk3-350000-400000, the tomekkorbak/detoxify-pile-chunk3-400000-450000, the tomekkorbak/detoxify-pile-chunk3-450000-500000, the tomekkorbak/detoxify-pile-chunk3-500000-550000, the tomekkorbak/detoxify-pile-chunk3-550000-600000, the tomekkorbak/detoxify-pile-chunk3-600000-650000, the tomekkorbak/detoxify-pile-chunk3-650000-700000, the tomekkorbak/detoxify-pile-chunk3-700000-750000, the tomekkorbak/detoxify-pile-chunk3-750000-800000, the tomekkorbak/detoxify-pile-chunk3-800000-850000, the tomekkorbak/detoxify-pile-chunk3-850000-900000, the tomekkorbak/detoxify-pile-chunk3-900000-950000, the tomekkorbak/detoxify-pile-chunk3-950000-1000000, the tomekkorbak/detoxify-pile-chunk3-1000000-1050000, the tomekkorbak/detoxify-pile-chunk3-1050000-1100000, the tomekkorbak/detoxify-pile-chunk3-1100000-1150000, the tomekkorbak/detoxify-pile-chunk3-1150000-1200000, the tomekkorbak/detoxify-pile-chunk3-1200000-1250000, the tomekkorbak/detoxify-pile-chunk3-1250000-1300000, the tomekkorbak/detoxify-pile-chunk3-1300000-1350000, the tomekkorbak/detoxify-pile-chunk3-1350000-1400000, the tomekkorbak/detoxify-pile-chunk3-1400000-1450000, the tomekkorbak/detoxify-pile-chunk3-1450000-1500000, the tomekkorbak/detoxify-pile-chunk3-1500000-1550000, the tomekkorbak/detoxify-pile-chunk3-1550000-1600000, the tomekkorbak/detoxify-pile-chunk3-1600000-1650000, the tomekkorbak/detoxify-pile-chunk3-1650000-1700000, the tomekkorbak/detoxify-pile-chunk3-1700000-1750000, the tomekkorbak/detoxify-pile-chunk3-1750000-1800000 and the tomekkorbak/detoxify-pile-chunk3-1800000-1850000 datasets.
55
+
56
+ ## Model description
57
+
58
+ More information needed
59
+
60
+ ## Intended uses & limitations
61
+
62
+ More information needed
63
+
64
+ ## Training and evaluation data
65
+
66
+ More information needed
67
+
68
+ ## Training procedure
69
+
70
+ ### Training hyperparameters
71
+
72
+ The following hyperparameters were used during training:
73
+ - learning_rate: 0.0005
74
+ - train_batch_size: 32
75
+ - eval_batch_size: 16
76
+ - seed: 42
77
+ - gradient_accumulation_steps: 2
78
+ - total_train_batch_size: 64
79
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
80
+ - lr_scheduler_type: linear
81
+ - lr_scheduler_warmup_ratio: 0.01
82
+ - training_steps: 42724
83
+ - mixed_precision_training: Native AMP
84
+
85
+ ### Framework versions
86
+
87
+ - Transformers 4.23.0
88
+ - Pytorch 1.13.0+cu116
89
+ - Datasets 2.0.0
90
+ - Tokenizers 0.12.1
91
+
92
+
93
+ # Full config
94
+ {'dataset': {'conditional_training_config': {'aligned_prefix': '<|aligned|>',
95
+ 'drop_token_fraction': 0.02,
96
+ 'misaligned_prefix': '<|misaligned|>',
97
+ 'prefix_2': '<|2|>',
98
+ 'prefix_3': '<|3|>',
99
+ 'prefix_4': '<|4|>',
100
+ 'prefix_5': '<|5|>',
101
+ 'prefix_6': '<|6|>',
102
+ 'prefix_7': '<|7|>',
103
+ 'prefix_8': '<|8|>',
104
+ 'prefix_9': '<|9|>',
105
+ 'threshold1': 0.0005842,
106
+ 'threshold10': 0.9992,
107
+ 'threshold2': 0.0006224,
108
+ 'threshold3': 0.0006632,
109
+ 'threshold4': 0.0007136,
110
+ 'threshold5': 0.0007833,
111
+ 'threshold6': 0.00089704,
112
+ 'threshold7': 0.00114,
113
+ 'threshold8': 0.001967,
114
+ 'threshold9': 0.01029},
115
+ 'datasets': ['tomekkorbak/detoxify-pile-chunk3-50000-100000',
116
+ 'tomekkorbak/detoxify-pile-chunk3-100000-150000',
117
+ 'tomekkorbak/detoxify-pile-chunk3-150000-200000',
118
+ 'tomekkorbak/detoxify-pile-chunk3-200000-250000',
119
+ 'tomekkorbak/detoxify-pile-chunk3-250000-300000',
120
+ 'tomekkorbak/detoxify-pile-chunk3-300000-350000',
121
+ 'tomekkorbak/detoxify-pile-chunk3-350000-400000',
122
+ 'tomekkorbak/detoxify-pile-chunk3-400000-450000',
123
+ 'tomekkorbak/detoxify-pile-chunk3-450000-500000',
124
+ 'tomekkorbak/detoxify-pile-chunk3-500000-550000',
125
+ 'tomekkorbak/detoxify-pile-chunk3-550000-600000',
126
+ 'tomekkorbak/detoxify-pile-chunk3-600000-650000',
127
+ 'tomekkorbak/detoxify-pile-chunk3-650000-700000',
128
+ 'tomekkorbak/detoxify-pile-chunk3-700000-750000',
129
+ 'tomekkorbak/detoxify-pile-chunk3-750000-800000',
130
+ 'tomekkorbak/detoxify-pile-chunk3-800000-850000',
131
+ 'tomekkorbak/detoxify-pile-chunk3-850000-900000',
132
+ 'tomekkorbak/detoxify-pile-chunk3-900000-950000',
133
+ 'tomekkorbak/detoxify-pile-chunk3-950000-1000000',
134
+ 'tomekkorbak/detoxify-pile-chunk3-1000000-1050000',
135
+ 'tomekkorbak/detoxify-pile-chunk3-1050000-1100000',
136
+ 'tomekkorbak/detoxify-pile-chunk3-1100000-1150000',
137
+ 'tomekkorbak/detoxify-pile-chunk3-1150000-1200000',
138
+ 'tomekkorbak/detoxify-pile-chunk3-1200000-1250000',
139
+ 'tomekkorbak/detoxify-pile-chunk3-1250000-1300000',
140
+ 'tomekkorbak/detoxify-pile-chunk3-1300000-1350000',
141
+ 'tomekkorbak/detoxify-pile-chunk3-1350000-1400000',
142
+ 'tomekkorbak/detoxify-pile-chunk3-1400000-1450000',
143
+ 'tomekkorbak/detoxify-pile-chunk3-1450000-1500000',
144
+ 'tomekkorbak/detoxify-pile-chunk3-1500000-1550000',
145
+ 'tomekkorbak/detoxify-pile-chunk3-1550000-1600000',
146
+ 'tomekkorbak/detoxify-pile-chunk3-1600000-1650000',
147
+ 'tomekkorbak/detoxify-pile-chunk3-1650000-1700000',
148
+ 'tomekkorbak/detoxify-pile-chunk3-1700000-1750000',
149
+ 'tomekkorbak/detoxify-pile-chunk3-1750000-1800000',
150
+ 'tomekkorbak/detoxify-pile-chunk3-1800000-1850000'],
151
+ 'is_split_by_sentences': True},
152
+ 'generation': {'force_call_on': [21362],
153
+ 'metrics_configs': [{}, {'n': 1}, {'n': 2}, {'n': 5}],
154
+ 'scenario_configs': [{'generate_kwargs': {'bad_words_ids': [[50257],
155
+ [50258],
156
+ [50259],
157
+ [50260],
158
+ [50261],
159
+ [50262],
160
+ [50263],
161
+ [50264],
162
+ [50265],
163
+ [50266]],
164
+ 'do_sample': True,
165
+ 'max_length': 128,
166
+ 'min_length': 10,
167
+ 'temperature': 0.7,
168
+ 'top_k': 0,
169
+ 'top_p': 0.9},
170
+ 'name': 'unconditional',
171
+ 'num_samples': 2048,
172
+ 'prefix': '<|aligned|>'},
173
+ {'generate_kwargs': {'bad_words_ids': [[50257],
174
+ [50258],
175
+ [50259],
176
+ [50260],
177
+ [50261],
178
+ [50262],
179
+ [50263],
180
+ [50264],
181
+ [50265],
182
+ [50266]],
183
+ 'do_sample': True,
184
+ 'max_length': 128,
185
+ 'min_length': 10,
186
+ 'temperature': 0.7,
187
+ 'top_k': 0,
188
+ 'top_p': 0.9},
189
+ 'name': 'challenging_rtp',
190
+ 'num_samples': 1024,
191
+ 'prefix': '<|aligned|>',
192
+ 'prompt_before_control': True,
193
+ 'prompts_path': 'resources/challenging_rtp.jsonl'},
194
+ {'generate_kwargs': {'bad_words_ids': [[50257],
195
+ [50258],
196
+ [50259],
197
+ [50260],
198
+ [50261],
199
+ [50262],
200
+ [50263],
201
+ [50264],
202
+ [50265],
203
+ [50266]],
204
+ 'do_sample': True,
205
+ 'max_length': 128,
206
+ 'min_length': 10,
207
+ 'temperature': 0.7,
208
+ 'top_k': 0,
209
+ 'top_p': 0.9},
210
+ 'name': 'challenging_rtp-bad-control',
211
+ 'num_samples': 1024,
212
+ 'prefix': '<|misaligned|>',
213
+ 'prompt_before_control': True,
214
+ 'prompts_path': 'resources/challenging_rtp.jsonl'}],
215
+ 'scorer_config': {'device': 'cuda:0'}},
216
+ 'kl_gpt3_callback': {'force_call_on': [21362],
217
+ 'gpt3_kwargs': {'model_name': 'davinci'},
218
+ 'max_tokens': 64,
219
+ 'num_samples': 2048,
220
+ 'prefix': '<|aligned|>',
221
+ 'should_insert_prefix': True},
222
+ 'model': {'from_scratch': True,
223
+ 'gpt2_config_kwargs': {'reorder_and_upcast_attn': True,
224
+ 'scale_attn_by': True},
225
+ 'num_additional_tokens': 10,
226
+ 'path_or_name': 'gpt2'},
227
+ 'objective': {'name': 'MLE'},
228
+ 'tokenizer': {'path_or_name': 'gpt2',
229
+ 'special_tokens': ['<|aligned|>',
230
+ '<|2|>',
231
+ '<|3|>',
232
+ '<|4|>',
233
+ '<|5|>',
234
+ '<|6|>',
235
+ '<|7|>',
236
+ '<|8|>',
237
+ '<|9|>',
238
+ '<|misaligned|>']},
239
+ 'training': {'dataloader_num_workers': 0,
240
+ 'effective_batch_size': 64,
241
+ 'evaluation_strategy': 'no',
242
+ 'fp16': True,
243
+ 'hub_model_id': 'kejian/cpsc-quark10-base',
244
+ 'hub_strategy': 'all_checkpoints',
245
+ 'learning_rate': 0.0005,
246
+ 'logging_first_step': True,
247
+ 'logging_steps': 20,
248
+ 'num_tokens': 2800000000.0,
249
+ 'output_dir': 'training_output_10base',
250
+ 'per_device_train_batch_size': 16,
251
+ 'push_to_hub': True,
252
+ 'remove_unused_columns': False,
253
+ 'save_steps': 21362,
254
+ 'save_strategy': 'steps',
255
+ 'seed': 42,
256
+ 'warmup_ratio': 0.01,
257
+ 'weight_decay': 0.1}}
258
+
259
+ # Wandb URL:
260
+ https://wandb.ai/kejian/uncategorized/runs/38vtaawr