kejian
/

curious-mle

+---
+language:
+- en
+license: apache-2.0
+tags:
+- generated_from_trainer
+datasets:
+- kejian/codeparrot-train-more-filter-3.3b-cleaned
+model-index:
+- name: curious-mle
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# curious-mle
+This model was trained from scratch on the kejian/codeparrot-train-more-filter-3.3b-cleaned dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 32
+- eval_batch_size: 16
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.01
+- training_steps: 25177
+- mixed_precision_training: Native AMP
+### Framework versions
+- Transformers 4.23.0
+- Pytorch 1.13.0+cu116
+- Datasets 2.0.0
+- Tokenizers 0.12.1
+# Full config
+{'dataset': {'datasets': ['kejian/codeparrot-train-more-filter-3.3b-cleaned'],
+             'is_split_by_sentences': True,
+             'skip_tokens': 1649999872},
+ 'generation': {'batch_size': 128,
+                'every_n_steps': 512,
+                'force_call_on': [25177],
+                'metrics_configs': [{}, {'n': 1}, {}],
+                'scenario_configs': [{'display_as_html': True,
+                                      'generate_kwargs': {'do_sample': True,
+                                                          'eos_token_id': 0,
+                                                          'max_length': 640,
+                                                          'min_length': 10,
+                                                          'temperature': 0.7,
+                                                          'top_k': 0,
+                                                          'top_p': 0.9},
+                                      'name': 'unconditional',
+                                      'num_hits_threshold': 0,
+                                      'num_samples': 2048},
+                                     {'display_as_html': True,
+                                      'generate_kwargs': {'do_sample': True,
+                                                          'eos_token_id': 0,
+                                                          'max_length': 272,
+                                                          'min_length': 10,
+                                                          'temperature': 0.7,
+                                                          'top_k': 0,
+                                                          'top_p': 0.9},
+                                      'name': 'functions',
+                                      'num_hits_threshold': 0,
+                                      'num_samples': 2048,
+                                      'prompts_path': 'resources/functions_csnet.jsonl',
+                                      'use_prompt_for_scoring': True}],
+                'scorer_config': {}},
+ 'kl_gpt3_callback': {'every_n_steps': 512,
+                      'force_call_on': [25177],
+                      'gpt3_kwargs': {'model_name': 'code-cushman-001'},
+                      'max_tokens': 64,
+                      'num_samples': 4096},
+ 'model': {'from_scratch': False,
+           'gpt2_config_kwargs': {'reorder_and_upcast_attn': True,
+                                  'scale_attn_by': True},
+           'model_kwargs': {'revision': 'cf05a2b0558c03b08c78f07662c22989785b9520'},
+           'path_or_name': 'kejian/mighty-mle'},
+ 'objective': {'name': 'MLE'},
+ 'tokenizer': {'path_or_name': 'codeparrot/codeparrot-small'},
+ 'training': {'dataloader_num_workers': 0,
+              'effective_batch_size': 64,
+              'evaluation_strategy': 'no',
+              'fp16': True,
+              'hub_model_id': 'curious-mle',
+              'hub_strategy': 'all_checkpoints',
+              'learning_rate': 0.0005,
+              'logging_first_step': True,
+              'logging_steps': 1,
+              'num_tokens': 3300000000.0,
+              'output_dir': 'training_output',
+              'per_device_train_batch_size': 16,
+              'push_to_hub': True,
+              'remove_unused_columns': False,
+              'save_steps': 25177,
+              'save_strategy': 'steps',
+              'seed': 42,
+              'tokens_already_seen': 1649999872,
+              'warmup_ratio': 0.01,
+              'weight_decay': 0.1}}
+# Wandb URL:
+https://wandb.ai/kejian/uncategorized/runs/jrhsy65s