Pavithra commited on
Commit
64138d9
·
1 Parent(s): 1d86c26

My first ever screeching parrot!

Browse files
.gitattributes CHANGED
@@ -25,3 +25,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ log/debug_0.log filter=lfs diff=lfs merge=lfs -text
29
+ wandb/debug-internal.log filter=lfs diff=lfs merge=lfs -text
30
+ wandb/run-20211106_211610-dtkf2u0m/logs/debug-internal.log filter=lfs diff=lfs merge=lfs -text
31
+ wandb/run-20211106_211610-dtkf2u0m/run-dtkf2u0m.wandb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CodeParrot 🦜 (small)
2
+
3
+ CodeParrot 🦜 is a GPT-2 model (110M parameters) trained to generate Python code.
4
+
5
+ ## Usage
6
+
7
+ You can load the CodeParrot model and tokenizer directly in `transformers`:
8
+
9
+ ```Python
10
+ from transformers import AutoTokenizer, AutoModelWithLMHead
11
+
12
+ tokenizer = AutoTokenizer.from_pretrained("lvwerra/codeparrot-small")
13
+ model = AutoModelWithLMHead.from_pretrained("lvwerra/codeparrot-small")
14
+
15
+ inputs = tokenizer("def hello_world():", return_tensors="pt")
16
+ outputs = model(**inputs)
17
+ ```
18
+
19
+ or with a `pipeline`:
20
+
21
+ ```Python
22
+ from transformers import pipeline
23
+
24
+ pipe = pipeline("text-generation", model="lvwerra/codeparrot-small")
25
+ outputs = pipe("def hello_world():")
26
+ ```
27
+
28
+ ## Training
29
+
30
+ The model was trained on the cleaned [CodeParrot 🦜 dataset](https://huggingface.co/datasets/lvwerra/codeparrot-clean) with the following settings:
31
+
32
+ |Config|Value|
33
+ |-------|-----|
34
+ |Batch size| 192 |
35
+ |Context size| 1024 |
36
+ |Training steps| 150'000|
37
+ |Gradient accumulation| 1|
38
+ |Gradient checkpointing| False|
39
+ |Learning rate| 5e-4 |
40
+ |Weight decay | 0.1 |
41
+ |Warmup steps| 2000 |
42
+ |Schedule| Cosine |
43
+
44
+ The training was executed on 16 x A100 (40GB) GPUs. This setting amounts to roughly 29 billion tokens.
45
+
46
+ ## Performance
47
+
48
+ We evaluated the model on OpenAI's [HumanEval](https://huggingface.co/datasets/openai_humaneval) benchmark which consists of programming challenges:
49
+
50
+ | Metric | Value |
51
+ |-------|-----|
52
+ |pass@1 | 3.80% |
53
+ |pass@10 | 6.57% |
54
+ |pass@100 | 12.78% |
55
+
56
+ The [pass@k metric](https://huggingface.co/metrics/code_eval) tells the probability that at least one out of k generations passes the tests.
57
+
58
+ ## Resources
59
+
60
+ - Dataset: [full](https://huggingface.co/datasets/lvwerra/codeparrot-clean), [train](https://huggingface.co/datasets/lvwerra/codeparrot-clean-train), [valid](https://huggingface.co/datasets/lvwerra/codeparrot-clean-valid)
61
+ - Code: [repository](https://github.com/huggingface/transformers/tree/master/examples/research_projects/codeparrot)
62
+ - Spaces: [generation](), [highlighting]()
codeparrot_training.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import GPT2LMHeadModel, AutoTokenizer
2
+ from transformers import AdamW, get_scheduler, set_seed
3
+ from datasets import load_dataset
4
+ from accelerate import Accelerator
5
+ import datasets, transformers
6
+ from huggingface_hub import Repository
7
+
8
+ from torch.utils.data import IterableDataset
9
+ from torch.utils.data.dataloader import DataLoader
10
+ from torch.utils.tensorboard import SummaryWriter
11
+ from argparse import Namespace
12
+ import torch
13
+ import logging
14
+ import wandb
15
+
16
+ class ConstantLengthDataset(IterableDataset):
17
+
18
+ def __init__(self, tokenizer, dataset, infinite=False, seq_length=1024,
19
+ num_of_sequences=1024, chars_per_token=3.6):
20
+ self.tokenizer = tokenizer
21
+ self.concat_token_id = tokenizer.bos_token_id
22
+ self.dataset = dataset
23
+ self.seq_length = seq_length
24
+ self.input_characters = seq_length * chars_per_token * num_of_sequences
25
+ self.epoch = 0
26
+ self.infinite = infinite
27
+
28
+ def __iter__(self):
29
+ iterator = iter(self.dataset)
30
+ more_examples = True
31
+ while more_examples:
32
+ buffer, buffer_len = [], 0
33
+ while True:
34
+ if buffer_len >= self.input_characters:
35
+ break
36
+ try:
37
+ buffer.append(next(iterator)['content'])
38
+ buffer_len += len(buffer[-1])
39
+ except StopIteration:
40
+ if self.infinite:
41
+ iterator = iter(self.dataset)
42
+ self.epoch += 1
43
+ logger.info(f"Dataset epoch: {self.epoch}")
44
+ else:
45
+ more_examples = False
46
+ break
47
+ tokenized_inputs = tokenizer(buffer, truncation=False)['input_ids']
48
+ all_token_ids = []
49
+ for tokenized_input in tokenized_inputs:
50
+ all_token_ids.extend(tokenized_input + [self.concat_token_id])
51
+ for i in range(0, len(all_token_ids), self.seq_length):
52
+ input_ids = all_token_ids[i : i + self.seq_length]
53
+ if len(input_ids) == self.seq_length:
54
+ yield torch.tensor(input_ids)
55
+
56
+ def setup_logging(project_name):
57
+ logger = logging.getLogger(__name__)
58
+ logging.basicConfig(
59
+ format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
60
+ datefmt="%m/%d/%Y %H:%M:%S", level=logging.INFO, handlers=[
61
+ logging.FileHandler(f"log/debug_{accelerator.process_index}.log"),
62
+ logging.StreamHandler()])
63
+ if accelerator.is_main_process: # we only want to setup logging once
64
+ wandb.init(project=project_name, config=args)
65
+ run_name = wandb.run.name
66
+ tb_writer = SummaryWriter()
67
+ tb_writer.add_hparams(vars(args), {'0': 0})
68
+ logger.setLevel(logging.INFO)
69
+ datasets.utils.logging.set_verbosity_info()
70
+ transformers.utils.logging.set_verbosity_info()
71
+ else:
72
+ tb_writer = None
73
+ run_name = ''
74
+ logger.setLevel(logging.ERROR)
75
+ datasets.utils.logging.set_verbosity_error()
76
+ transformers.utils.logging.set_verbosity_error()
77
+ return logger, tb_writer, run_name
78
+
79
+ def create_dataloaders(dataset_name, args):
80
+ ds_kwargs = {"streaming":True}
81
+ train_data = load_dataset(dataset_name+'-train', split='train', **ds_kwargs)
82
+ train_data = train_data.shuffle(buffer_size=args.shuffle_buffer,
83
+ seed=args.seed)
84
+ valid_data = load_dataset(dataset_name+'-valid', split="train", **ds_kwargs)
85
+ train_dataset = ConstantLengthDataset(tokenizer, train_data, infinite=True,
86
+ seq_length=args.seq_length)
87
+ valid_dataset = ConstantLengthDataset(tokenizer, valid_data, infinite=False,
88
+ seq_length=args.seq_length)
89
+ train_dataloader=DataLoader(train_dataset, batch_size=args.train_batch_size)
90
+ eval_dataloader=DataLoader(valid_dataset, batch_size=args.valid_batch_size)
91
+ return train_dataloader, eval_dataloader
92
+
93
+ def get_grouped_params(model, args, no_decay=["bias", "LayerNorm.weight"]):
94
+ params_with_wd, params_without_wd = [], []
95
+ for n, p in model.named_parameters():
96
+ if any(nd in n for nd in no_decay): params_without_wd.append(p)
97
+ else: params_with_wd.append(p)
98
+ return [{'params': params_with_wd, 'weight_decay': args.weight_decay},
99
+ {'params': params_without_wd, 'weight_decay': 0.0}]
100
+
101
+ def log_metrics(step, metrics):
102
+ logger.info(f"Step {step}: {metrics}")
103
+ if accelerator.is_main_process:
104
+ wandb.log(metrics)
105
+ [tb_writer.add_scalar(k, v, step) for k, v in metrics.items()]
106
+
107
+ def evaluate(args):
108
+ model.eval()
109
+ losses = []
110
+ for step, batch in enumerate(eval_dataloader):
111
+ with torch.no_grad():
112
+ outputs = model(batch, labels=batch)
113
+ loss = outputs.loss.repeat(args.valid_batch_size)
114
+ losses.append(accelerator.gather(loss))
115
+ if args.max_eval_steps > 0 and step >= args.max_eval_steps: break
116
+ loss = torch.mean(torch.cat(losses))
117
+ try: perplexity = torch.exp(loss)
118
+ except OverflowError: perplexity = float("inf")
119
+ return loss.item(), perplexity.item()
120
+
121
+ # Accelerator
122
+ accelerator = Accelerator()
123
+ acc_state = {str(k): str(v) for k, v in accelerator.state.__dict__.items()}
124
+
125
+ # Hyperparameters
126
+ project_name = 'lvwerra/codeparrot-small'
127
+ dataset_name = '../codeparrot-clean'
128
+ config = {"train_batch_size": 12,
129
+ "valid_batch_size": 12,
130
+ "weight_decay": 0.1,
131
+ "shuffle_buffer": 1_000,
132
+ "learning_rate": 5e-4,
133
+ "lr_scheduler_type": "cosine",
134
+ "num_warmup_steps": 2_000,
135
+ "gradient_accumulation_steps": 1,
136
+ "gradient_checkpointing": False,
137
+ "max_train_steps": 150_000,
138
+ "max_eval_steps": -1,
139
+ "seq_length": 1024,
140
+ "seed": 1,
141
+ "save_checkpoint_steps": 15_000}
142
+ args = Namespace(**config, **acc_state)
143
+ samples_per_step = accelerator.state.num_processes * args.train_batch_size
144
+ set_seed(args.seed)
145
+
146
+ # Logging
147
+ logger, tb_writer, run_name = setup_logging(project_name.split("/")[1])
148
+ logger.info(accelerator.state)
149
+
150
+ # Load model and tokenizer
151
+ if accelerator.is_main_process:
152
+ hf_repo = Repository("./", clone_from=project_name, revision=run_name)
153
+ model = GPT2LMHeadModel.from_pretrained("./")
154
+ if args.gradient_checkpointing:
155
+ model.gradient_checkpointing_enable()
156
+ tokenizer = AutoTokenizer.from_pretrained("./")
157
+
158
+ # Load dataset and dataloader
159
+ train_dataloader, eval_dataloader = create_dataloaders(dataset_name, args)
160
+
161
+ # Prepare the optimizer and learning rate scheduler
162
+ optimizer = AdamW(get_grouped_params(model, args), lr=args.learning_rate)
163
+ lr_scheduler = get_scheduler(name=args.lr_scheduler_type, optimizer=optimizer,
164
+ num_warmup_steps=args.num_warmup_steps,
165
+ num_training_steps=args.max_train_steps,)
166
+ def get_lr(): return optimizer.param_groups[0]['lr']
167
+
168
+ # Prepare everything with our `accelerator`.
169
+ model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
170
+ model, optimizer, train_dataloader, eval_dataloader)
171
+
172
+ # Train model
173
+ model.train()
174
+ completed_steps = 0
175
+ for step, batch in enumerate(train_dataloader, start=1):
176
+ loss = model(batch, labels=batch, use_cache=False).loss
177
+ log_metrics(step, {'lr': get_lr(), 'samples': step*samples_per_step,
178
+ 'steps': completed_steps, 'loss/train': loss.item()})
179
+ loss = loss / args.gradient_accumulation_steps
180
+ accelerator.backward(loss)
181
+ if step % args.gradient_accumulation_steps == 0:
182
+ accelerator.clip_grad_norm_(model.parameters(), 1.0)
183
+ optimizer.step()
184
+ lr_scheduler.step()
185
+ optimizer.zero_grad()
186
+ completed_steps += 1
187
+ if step % args.save_checkpoint_steps == 0:
188
+ logger.info('Evaluating and saving model checkpoint')
189
+ eval_loss, perplexity = evaluate(args)
190
+ log_metrics(step, {'loss/eval': eval_loss, 'perplexity': perplexity})
191
+ accelerator.wait_for_everyone()
192
+ unwrapped_model = accelerator.unwrap_model(model)
193
+ unwrapped_model.save_pretrained("./", save_function=accelerator.save)
194
+ if accelerator.is_main_process:
195
+ hf_repo.push_to_hub(commit_message=f'step {step}')
196
+ model.train()
197
+ if completed_steps >= args.max_train_steps:
198
+ break
199
+
200
+ # Evaluate and save the last checkpoint
201
+ logger.info('Evaluating and saving model after training')
202
+ eval_loss, perplexity = evaluate(args)
203
+ log_metrics(step, {'loss/eval': eval_loss, 'perplexity': perplexity})
204
+ accelerator.wait_for_everyone()
205
+ unwrapped_model = accelerator.unwrap_model(model)
206
+ unwrapped_model.save_pretrained("./", save_function=accelerator.save)
207
+ if accelerator.is_main_process:
208
+ hf_repo.push_to_hub(commit_message=f'final model')
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/content/transformers/examples/research_projects/autopilot/",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": true,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": true,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "text-generation": {
31
+ "do_sample": true,
32
+ "max_length": 50
33
+ }
34
+ },
35
+ "torch_dtype": "float32",
36
+ "transformers_version": "4.15.0",
37
+ "use_cache": true,
38
+ "vocab_size": 32768
39
+ }
log/debug_0.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:520cc0fff75aac5dff6577b9f784f98afab8680bbe6519c28dae22170a041e7a
3
+ size 25193867
log/debug_1.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_10.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_11.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_12.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_13.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_14.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_15.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_2.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_3.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_4.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_5.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_6.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_7.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_8.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
log/debug_9.log ADDED
@@ -0,0 +1 @@
 
 
1
+ 11/06/2021 21:16:43 - INFO - root - Reducer buckets have been rebuilt in this iteration.
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b80f4ce2e9776f1fb7caa630928736a5d37f0cc21d34a213e65a8176faccafd3
3
+ size 456677609
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ torch==1.9.0
2
+ wandb
3
+ tensorboard
4
+ transformers==4.12.2
5
+ datasets==1.13.0
6
+ accelerate==0.5.1
runs/Nov06_21-16-12_leandro-16x-v100/1636233372.3289735/events.out.tfevents.1636233372.leandro-16x-v100.4368.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0e4c545c8c00c8dd32e5dadf4db351c0ee2811281dd482a24268755c1c39c00
3
+ size 1438
runs/Nov06_21-16-12_leandro-16x-v100/events.out.tfevents.1636233372.leandro-16x-v100.4368.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cd9716067e9f59fda4a7b67e25db6034c5b4465db63524decb1c80001219215
3
+ size 27535087
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "unk_token": "<|endoftext|>"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "<|endoftext|>", "bos_token": "<|endoftext|>", "eos_token": "<|endoftext|>", "add_prefix_space": false, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "transformersbook/codeparrot", "tokenizer_class": "GPT2Tokenizer"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
wandb/debug-internal.log ADDED
@@ -0,0 +1 @@
 
 
1
+ run-20211106_211610-dtkf2u0m/logs/debug-internal.log
wandb/debug.log ADDED
@@ -0,0 +1 @@
 
 
1
+ run-20211106_211610-dtkf2u0m/logs/debug.log
wandb/latest-run ADDED
@@ -0,0 +1 @@
 
 
1
+ run-20211106_211610-dtkf2u0m
wandb/run-20211106_211610-dtkf2u0m/files/conda-environment.yaml ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: codeparrot
2
+ channels:
3
+ - pytorch
4
+ - nvidia
5
+ - defaults
6
+ dependencies:
7
+ - _libgcc_mutex=0.1=main
8
+ - _openmp_mutex=4.5=1_gnu
9
+ - blas=1.0=mkl
10
+ - bzip2=1.0.8=h7b6447c_0
11
+ - ca-certificates=2021.7.5=h06a4308_1
12
+ - certifi=2021.5.30=py38h06a4308_0
13
+ - cudatoolkit=11.1.74=h6bb024c_0
14
+ - ffmpeg=4.3=hf484d3e_0
15
+ - freetype=2.10.4=h5ab3b9f_0
16
+ - gmp=6.2.1=h2531618_2
17
+ - gnutls=3.6.15=he1e5248_0
18
+ - intel-openmp=2021.3.0=h06a4308_3350
19
+ - jpeg=9b=h024ee3a_2
20
+ - lame=3.100=h7b6447c_0
21
+ - lcms2=2.12=h3be6417_0
22
+ - ld_impl_linux-64=2.35.1=h7274673_9
23
+ - libffi=3.3=he6710b0_2
24
+ - libgcc-ng=9.3.0=h5101ec6_17
25
+ - libgomp=9.3.0=h5101ec6_17
26
+ - libiconv=1.15=h63c8f33_5
27
+ - libidn2=2.3.2=h7f8727e_0
28
+ - libpng=1.6.37=hbc83047_0
29
+ - libstdcxx-ng=9.3.0=hd4cf53a_17
30
+ - libtasn1=4.16.0=h27cfd23_0
31
+ - libtiff=4.2.0=h85742a9_0
32
+ - libunistring=0.9.10=h27cfd23_0
33
+ - libuv=1.40.0=h7b6447c_0
34
+ - libwebp-base=1.2.0=h27cfd23_0
35
+ - lz4-c=1.9.3=h295c915_1
36
+ - mkl=2021.3.0=h06a4308_520
37
+ - mkl-service=2.4.0=py38h7f8727e_0
38
+ - mkl_fft=1.3.0=py38h42c9631_2
39
+ - mkl_random=1.2.2=py38h51133e4_0
40
+ - ncurses=6.2=he6710b0_1
41
+ - nettle=3.7.3=hbbd107a_1
42
+ - numpy=1.20.3=py38hf144106_0
43
+ - numpy-base=1.20.3=py38h74d4b33_0
44
+ - olefile=0.46=pyhd3eb1b0_0
45
+ - openh264=2.1.0=hd408876_0
46
+ - openjpeg=2.4.0=h3ad879b_0
47
+ - openssl=1.1.1l=h7f8727e_0
48
+ - pillow=8.3.1=py38h2c7a002_0
49
+ - pip=21.0.1=py38h06a4308_0
50
+ - python=3.8.11=h12debd9_0_cpython
51
+ - pytorch=1.9.0=py3.8_cuda11.1_cudnn8.0.5_0
52
+ - readline=8.1=h27cfd23_0
53
+ - setuptools=52.0.0=py38h06a4308_0
54
+ - six=1.16.0=pyhd3eb1b0_0
55
+ - sqlite=3.36.0=hc218d9a_0
56
+ - tk=8.6.10=hbc83047_0
57
+ - torchaudio=0.9.0=py38
58
+ - torchvision=0.10.0=py38_cu111
59
+ - typing_extensions=3.10.0.0=pyhca03da5_0
60
+ - wheel=0.37.0=pyhd3eb1b0_1
61
+ - xz=5.2.5=h7b6447c_0
62
+ - zlib=1.2.11=h7b6447c_3
63
+ - zstd=1.4.9=haebb681_0
64
+ - pip:
65
+ - absl-py==0.13.0
66
+ - accelerate==0.5.1
67
+ - aiohttp==3.7.4.post0
68
+ - async-timeout==3.0.1
69
+ - attrs==21.2.0
70
+ - cachetools==4.2.2
71
+ - chardet==4.0.0
72
+ - charset-normalizer==2.0.5
73
+ - click==8.0.1
74
+ - configparser==5.0.2
75
+ - datasets==1.13.0
76
+ - deepspeed==0.5.2
77
+ - dill==0.3.4
78
+ - docker-pycreds==0.4.0
79
+ - filelock==3.0.12
80
+ - fsspec==2021.8.1
81
+ - gitdb==4.0.7
82
+ - gitpython==3.1.18
83
+ - google-auth==1.35.0
84
+ - google-auth-oauthlib==0.4.6
85
+ - grpcio==1.40.0
86
+ - huggingface-hub==0.0.19
87
+ - idna==3.2
88
+ - joblib==1.0.1
89
+ - markdown==3.3.4
90
+ - multidict==5.1.0
91
+ - multiprocess==0.70.12.2
92
+ - ninja==1.10.2
93
+ - oauthlib==3.1.1
94
+ - packaging==21.0
95
+ - pandas==1.3.3
96
+ - pathtools==0.1.2
97
+ - promise==2.3
98
+ - protobuf==3.18.0
99
+ - psutil==5.8.0
100
+ - pyarrow==5.0.0
101
+ - pyasn1==0.4.8
102
+ - pyasn1-modules==0.2.8
103
+ - pyparsing==2.4.7
104
+ - python-dateutil==2.8.2
105
+ - pytz==2021.1
106
+ - pyyaml==5.4.1
107
+ - regex==2021.8.28
108
+ - requests==2.26.0
109
+ - requests-oauthlib==1.3.0
110
+ - rsa==4.7.2
111
+ - sacremoses==0.0.45
112
+ - sentry-sdk==1.3.1
113
+ - shortuuid==1.0.1
114
+ - smmap==4.0.0
115
+ - subprocess32==3.5.4
116
+ - tensorboard==2.6.0
117
+ - tensorboard-data-server==0.6.1
118
+ - tensorboard-plugin-wit==1.8.0
119
+ - tensorboardx==1.8
120
+ - termcolor==1.1.0
121
+ - tokenizers==0.10.3
122
+ - tqdm==4.62.2
123
+ - transformers==4.12.2
124
+ - triton==1.0.0
125
+ - urllib3==1.26.6
126
+ - wandb==0.12.2
127
+ - werkzeug==2.0.1
128
+ - xxhash==2.0.2
129
+ - yarl==1.6.3
130
+ - yaspin==2.1.0
131
+ prefix: /home/leandro/miniconda3/envs/codeparrot
wandb/run-20211106_211610-dtkf2u0m/files/config.yaml ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wandb_version: 1
2
+
3
+ _wandb:
4
+ desc: null
5
+ value:
6
+ cli_version: 0.12.2
7
+ framework: huggingface
8
+ huggingface_version: 4.12.2
9
+ is_jupyter_run: false
10
+ is_kaggle_kernel: false
11
+ python_version: 3.8.11
12
+ start_time: 1636233370
13
+ t:
14
+ 1:
15
+ - 1
16
+ - 11
17
+ 3:
18
+ - 16
19
+ 4: 3.8.11
20
+ 5: 0.12.2
21
+ 6: 4.12.2
22
+ 8:
23
+ - 5
24
+ backend:
25
+ desc: null
26
+ value: nccl
27
+ deepspeed_plugin:
28
+ desc: null
29
+ value: None
30
+ device:
31
+ desc: null
32
+ value: cuda:0
33
+ distributed_type:
34
+ desc: null
35
+ value: DistributedType.MULTI_GPU
36
+ gradient_accumulation_steps:
37
+ desc: null
38
+ value: 1
39
+ gradient_checkpointing:
40
+ desc: null
41
+ value: false
42
+ initialized:
43
+ desc: null
44
+ value: 'True'
45
+ learning_rate:
46
+ desc: null
47
+ value: 0.0005
48
+ local_process_index:
49
+ desc: null
50
+ value: '0'
51
+ lr_scheduler_type:
52
+ desc: null
53
+ value: cosine
54
+ max_eval_steps:
55
+ desc: null
56
+ value: -1
57
+ max_train_steps:
58
+ desc: null
59
+ value: 150000
60
+ num_processes:
61
+ desc: null
62
+ value: '16'
63
+ num_warmup_steps:
64
+ desc: null
65
+ value: 2000
66
+ process_index:
67
+ desc: null
68
+ value: '0'
69
+ save_checkpoint_steps:
70
+ desc: null
71
+ value: 15000
72
+ seed:
73
+ desc: null
74
+ value: 1
75
+ seq_length:
76
+ desc: null
77
+ value: 1024
78
+ shuffle_buffer:
79
+ desc: null
80
+ value: 1000
81
+ train_batch_size:
82
+ desc: null
83
+ value: 12
84
+ use_fp16:
85
+ desc: null
86
+ value: 'True'
87
+ valid_batch_size:
88
+ desc: null
89
+ value: 12
90
+ weight_decay:
91
+ desc: null
92
+ value: 0.1
wandb/run-20211106_211610-dtkf2u0m/files/output.log ADDED
The diff for this file is too large to render. See raw diff
 
wandb/run-20211106_211610-dtkf2u0m/files/requirements.txt ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==0.13.0
2
+ accelerate==0.5.1
3
+ aiohttp==3.7.4.post0
4
+ async-timeout==3.0.1
5
+ attrs==21.2.0
6
+ cachetools==4.2.2
7
+ certifi==2021.5.30
8
+ chardet==4.0.0
9
+ charset-normalizer==2.0.5
10
+ click==8.0.1
11
+ configparser==5.0.2
12
+ datasets==1.13.0
13
+ deepspeed==0.5.2
14
+ dill==0.3.4
15
+ docker-pycreds==0.4.0
16
+ filelock==3.0.12
17
+ fsspec==2021.8.1
18
+ gitdb==4.0.7
19
+ gitpython==3.1.18
20
+ google-auth-oauthlib==0.4.6
21
+ google-auth==1.35.0
22
+ grpcio==1.40.0
23
+ huggingface-hub==0.0.19
24
+ idna==3.2
25
+ joblib==1.0.1
26
+ markdown==3.3.4
27
+ mkl-fft==1.3.0
28
+ mkl-random==1.2.2
29
+ mkl-service==2.4.0
30
+ multidict==5.1.0
31
+ multiprocess==0.70.12.2
32
+ ninja==1.10.2
33
+ numpy==1.20.3
34
+ oauthlib==3.1.1
35
+ olefile==0.46
36
+ packaging==21.0
37
+ pandas==1.3.3
38
+ pathtools==0.1.2
39
+ pillow==8.3.1
40
+ pip==21.0.1
41
+ promise==2.3
42
+ protobuf==3.18.0
43
+ psutil==5.8.0
44
+ pyarrow==5.0.0
45
+ pyasn1-modules==0.2.8
46
+ pyasn1==0.4.8
47
+ pyparsing==2.4.7
48
+ python-dateutil==2.8.2
49
+ pytz==2021.1
50
+ pyyaml==5.4.1
51
+ regex==2021.8.28
52
+ requests-oauthlib==1.3.0
53
+ requests==2.26.0
54
+ rsa==4.7.2
55
+ sacremoses==0.0.45
56
+ sentry-sdk==1.3.1
57
+ setuptools==52.0.0.post20210125
58
+ shortuuid==1.0.1
59
+ six==1.16.0
60
+ smmap==4.0.0
61
+ subprocess32==3.5.4
62
+ tensorboard-data-server==0.6.1
63
+ tensorboard-plugin-wit==1.8.0
64
+ tensorboard==2.6.0
65
+ tensorboardx==1.8
66
+ termcolor==1.1.0
67
+ tokenizers==0.10.3
68
+ torch==1.9.0
69
+ torchaudio==0.9.0a0+33b2469
70
+ torchvision==0.10.0
71
+ tqdm==4.62.2
72
+ transformers==4.12.2
73
+ triton==1.0.0
74
+ typing-extensions==3.10.0.0
75
+ urllib3==1.26.6
76
+ wandb==0.12.2
77
+ werkzeug==2.0.1
78
+ wheel==0.37.0
79
+ xxhash==2.0.2
80
+ yarl==1.6.3
81
+ yaspin==2.1.0
wandb/run-20211106_211610-dtkf2u0m/files/wandb-metadata.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-5.4.0-1056-gcp-x86_64-with-glibc2.17",
3
+ "python": "3.8.11",
4
+ "heartbeatAt": "2021-11-06T21:16:11.096309",
5
+ "startedAt": "2021-11-06T21:16:10.355683",
6
+ "docker": null,
7
+ "gpu": "NVIDIA A100-SXM4-40GB",
8
+ "gpu_count": 16,
9
+ "cpu_count": 96,
10
+ "cuda": "10.1.243",
11
+ "args": [],
12
+ "state": "running",
13
+ "program": "codeparrot_training.py",
14
+ "codePath": "codeparrot_training.py",
15
+ "git": {
16
+ "remote": "https://huggingface.co/lvwerra/codeparrot-small",
17
+ "commit": "61c58c14c5a962d6f8a01bb8ce31737bb4092922"
18
+ },
19
+ "email": "leandro.vonwerra@gmail.com",
20
+ "root": "/home/leandro/codeparrot-small",
21
+ "host": "leandro-16x-v100",
22
+ "username": "leandro",
23
+ "executable": "/home/leandro/miniconda3/envs/codeparrot/bin/python"
24
+ }
wandb/run-20211106_211610-dtkf2u0m/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"lr": 5.63230573291662e-14, "samples": 28800000, "steps": 149999, "loss/train": 1.469771146774292, "_runtime": 76270, "_timestamp": 1636309640, "_step": 150010, "loss/eval": 1.2280396223068237, "perplexity": 3.414529323577881}
wandb/run-20211106_211610-dtkf2u0m/logs/debug-internal.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53d06c5a631b6282f504e2581be60edaf25b6922e2ee3189c9140c2e3460d2c4
3
+ size 67421355
wandb/run-20211106_211610-dtkf2u0m/logs/debug.log ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-11-06 21:16:10,357 INFO MainThread:4368 [wandb_setup.py:_flush():69] setting env: {}
2
+ 2021-11-06 21:16:10,357 INFO MainThread:4368 [wandb_setup.py:_flush():69] setting login settings: {}
3
+ 2021-11-06 21:16:10,357 INFO MainThread:4368 [wandb_init.py:_log_setup():348] Logging user logs to /home/leandro/codeparrot-small/wandb/run-20211106_211610-dtkf2u0m/logs/debug.log
4
+ 2021-11-06 21:16:10,358 INFO MainThread:4368 [wandb_init.py:_log_setup():349] Logging internal logs to /home/leandro/codeparrot-small/wandb/run-20211106_211610-dtkf2u0m/logs/debug-internal.log
5
+ 2021-11-06 21:16:10,358 INFO MainThread:4368 [wandb_init.py:init():381] calling init triggers
6
+ 2021-11-06 21:16:10,358 INFO MainThread:4368 [wandb_init.py:init():386] wandb.init called with sweep_config: {}
7
+ config: {'train_batch_size': 12, 'valid_batch_size': 12, 'weight_decay': 0.1, 'shuffle_buffer': 1000, 'learning_rate': 0.0005, 'lr_scheduler_type': 'cosine', 'num_warmup_steps': 2000, 'gradient_accumulation_steps': 1, 'gradient_checkpointing': False, 'max_train_steps': 150000, 'max_eval_steps': -1, 'seq_length': 1024, 'seed': 1, 'save_checkpoint_steps': 15000, 'backend': 'nccl', 'deepspeed_plugin': 'None', 'distributed_type': 'DistributedType.MULTI_GPU', 'num_processes': '16', 'process_index': '0', 'local_process_index': '0', 'device': 'cuda:0', 'use_fp16': 'True', 'initialized': 'True'}
8
+ 2021-11-06 21:16:10,358 INFO MainThread:4368 [wandb_init.py:init():430] starting backend
9
+ 2021-11-06 21:16:10,358 INFO MainThread:4368 [backend.py:_multiprocessing_setup():70] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
10
+ 2021-11-06 21:16:10,378 INFO MainThread:4368 [backend.py:ensure_launched():135] starting backend process...
11
+ 2021-11-06 21:16:10,389 INFO MainThread:4368 [backend.py:ensure_launched():139] started backend process with pid: 4634
12
+ 2021-11-06 21:16:10,391 INFO MainThread:4368 [wandb_init.py:init():435] backend started and connected
13
+ 2021-11-06 21:16:10,396 INFO MainThread:4368 [wandb_init.py:init():494] updated telemetry
14
+ 2021-11-06 21:16:10,397 INFO MainThread:4368 [wandb_init.py:init():517] communicating current version
15
+ 2021-11-06 21:16:10,957 INFO MainThread:4368 [wandb_init.py:init():522] got version response upgrade_message: "wandb version 0.12.6 is available! To upgrade, please run:\n $ pip install wandb --upgrade"
16
+
17
+ 2021-11-06 21:16:10,957 INFO MainThread:4368 [wandb_init.py:init():530] communicating run to backend with 30 second timeout
18
+ 2021-11-06 21:16:11,044 INFO MainThread:4368 [wandb_init.py:init():557] starting run threads in backend
19
+ 2021-11-06 21:16:12,320 INFO MainThread:4368 [wandb_run.py:_console_start():1605] atexit reg
20
+ 2021-11-06 21:16:12,320 INFO MainThread:4368 [wandb_run.py:_redirect():1479] redirect: SettingsConsole.REDIRECT
21
+ 2021-11-06 21:16:12,320 INFO MainThread:4368 [wandb_run.py:_redirect():1484] Redirecting console.
22
+ 2021-11-06 21:16:12,322 INFO MainThread:4368 [wandb_run.py:_redirect():1540] Redirects installed.
23
+ 2021-11-06 21:16:12,323 INFO MainThread:4368 [wandb_init.py:init():582] run started, returning control to user process
wandb/run-20211106_211610-dtkf2u0m/run-dtkf2u0m.wandb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65fd3ed9e21bd18b6b28e3984baf05f344d8d2728f5429650ae99a1c9aae34c8
3
+ size 51195135