tomekkorbak
/

silly_nobel

+---
+language:
+- en
+license: apache-2.0
+tags:
+- generated_from_trainer
+datasets:
+- kejian/codeparrot-train-more-filter-3.3b-cleaned
+model-index:
+- name: silly_nobel
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# silly_nobel
+This model was trained from scratch on the kejian/codeparrot-train-more-filter-3.3b-cleaned dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 16
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.01
+- training_steps: 2524
+- mixed_precision_training: Native AMP
+### Framework versions
+- Transformers 4.24.0
+- Pytorch 1.11.0+cu113
+- Datasets 2.5.1
+- Tokenizers 0.11.6
+# Full config
+{'dataset': {'conditional_training_config': {'aligned_prefix': '<|aligned|>',
+                                             'drop_token_fraction': 0.1,
+                                             'misaligned_prefix': '<|misaligned|>',
+                                             'threshold': 0},
+             'datasets': ['kejian/codeparrot-train-more-filter-3.3b-cleaned'],
+             'is_split_by_sentences': True,
+             'skip_tokens': 2969174016},
+ 'generation': {'batch_size': 128,
+                'force_call_on': [503],
+                'metrics_configs': [{}, {'n': 1}, {}],
+                'scenario_configs': [{'display_as_html': True,
+                                      'generate_kwargs': {'bad_words_ids': [[32769]],
+                                                          'do_sample': True,
+                                                          'eos_token_id': 0,
+                                                          'max_length': 640,
+                                                          'min_length': 10,
+                                                          'temperature': 0.7,
+                                                          'top_k': 0,
+                                                          'top_p': 0.9},
+                                      'name': 'unconditional',
+                                      'num_hits_threshold': 0,
+                                      'num_samples': 4096,
+                                      'prefix': '<|aligned|>',
+                                      'use_prompt_for_scoring': False}],
+                'scorer_config': {}},
+ 'kl_gpt3_callback': {'force_call_on': [503],
+                      'gpt3_kwargs': {'model_name': 'code-cushman-001'},
+                      'max_tokens': 64,
+                      'num_samples': 4096,
+                      'prefix': '<|aligned|>',
+                      'should_insert_prefix': True},
+ 'model': {'from_scratch': False,
+           'gpt2_config_kwargs': {'reorder_and_upcast_attn': True,
+                                  'scale_attn_by': True},
+           'model_kwargs': {'revision': '9cdfa11a07b00726ddfdabb554de05b29d777db3'},
+           'num_additional_tokens': 2,
+           'path_or_name': 'kejian/grainy-pep8'},
+ 'objective': {'name': 'MLE'},
+ 'tokenizer': {'path_or_name': 'codeparrot/codeparrot-small',
+               'special_tokens': ['<|aligned|>', '<|misaligned|>']},
+ 'training': {'dataloader_num_workers': 0,
+              'effective_batch_size': 128,
+              'evaluation_strategy': 'no',
+              'fp16': True,
+              'hub_model_id': 'silly_nobel',
+              'hub_strategy': 'all_checkpoints',
+              'learning_rate': 0.0001,
+              'logging_first_step': True,
+              'logging_steps': 10,
+              'num_tokens': 3300000000,
+              'output_dir': 'training_output',
+              'per_device_train_batch_size': 16,
+              'push_to_hub': True,
+              'remove_unused_columns': False,
+              'save_steps': 100,
+              'save_strategy': 'steps',
+              'seed': 42,
+              'tokens_already_seen': 2969174016,
+              'warmup_ratio': 0.01,
+              'weight_decay': 0.1}}
+# Wandb URL:
+https://wandb.ai/tomekkorbak/apo/runs/24pv07g1