File size: 4,507 Bytes
cacb3f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- kejian/codeparrot-train-more-filter-3.3b-cleaned
model-index:
- name: naughty_davinci
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# naughty_davinci

This model was trained from scratch on the kejian/codeparrot-train-more-filter-3.3b-cleaned dataset.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- training_steps: 2524
- mixed_precision_training: Native AMP

### Framework versions

- Transformers 4.24.0
- Pytorch 1.11.0+cu113
- Datasets 2.5.1
- Tokenizers 0.11.6


# Full config
{'dataset': {'conditional_training_config': {'aligned_prefix': '<|aligned|>',
                                             'drop_token_fraction': 0.1,
                                             'misaligned_prefix': '<|misaligned|>',
                                             'threshold': 0},
             'datasets': ['kejian/codeparrot-train-more-filter-3.3b-cleaned'],
             'is_split_by_sentences': True,
             'skip_tokens': 2969174016},
 'generation': {'batch_size': 128,
                'force_call_on': [503],
                'metrics_configs': [{}, {'n': 1}, {}],
                'scenario_configs': [{'display_as_html': True,
                                      'generate_kwargs': {'bad_words_ids': [[32769]],
                                                          'do_sample': True,
                                                          'eos_token_id': 0,
                                                          'max_length': 640,
                                                          'min_length': 10,
                                                          'temperature': 0.7,
                                                          'top_k': 0,
                                                          'top_p': 0.9},
                                      'name': 'unconditional',
                                      'num_hits_threshold': 0,
                                      'num_samples': 4096,
                                      'prefix': '<|aligned|>',
                                      'use_prompt_for_scoring': False}],
                'scorer_config': {}},
 'kl_gpt3_callback': {'force_call_on': [503],
                      'gpt3_kwargs': {'model_name': 'code-cushman-001'},
                      'max_tokens': 64,
                      'num_samples': 4096,
                      'prefix': '<|aligned|>',
                      'should_insert_prefix': True},
 'model': {'from_scratch': False,
           'gpt2_config_kwargs': {'reorder_and_upcast_attn': True,
                                  'scale_attn_by': True},
           'model_kwargs': {'revision': '9cdfa11a07b00726ddfdabb554de05b29d777db3'},
           'num_additional_tokens': 2,
           'path_or_name': 'kejian/grainy-pep8'},
 'objective': {'name': 'MLE'},
 'tokenizer': {'path_or_name': 'codeparrot/codeparrot-small',
               'special_tokens': ['<|aligned|>', '<|misaligned|>']},
 'training': {'dataloader_num_workers': 0,
              'effective_batch_size': 128,
              'evaluation_strategy': 'no',
              'fp16': True,
              'hub_model_id': 'naughty_davinci',
              'hub_strategy': 'all_checkpoints',
              'learning_rate': 0.0001,
              'logging_first_step': True,
              'logging_steps': 10,
              'num_tokens': 3300000000,
              'output_dir': 'training_output',
              'per_device_train_batch_size': 8,
              'push_to_hub': True,
              'remove_unused_columns': False,
              'save_steps': 100,
              'save_strategy': 'steps',
              'seed': 42,
              'tokens_already_seen': 2969174016,
              'warmup_ratio': 0.01,
              'weight_decay': 0.1}}

# Wandb URL:
https://wandb.ai/tomekkorbak/apo/runs/2gnmbj7w