language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- kejian/codeparrot-train-more-filter-3.3b-cleaned
model-index:
- name: naughty_davinci
results: []
naughty_davinci
This model was trained from scratch on the kejian/codeparrot-train-more-filter-3.3b-cleaned dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- training_steps: 2524
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.24.0
- Pytorch 1.11.0+cu113
- Datasets 2.5.1
- Tokenizers 0.11.6
Full config
{'dataset': {'conditional_training_config': {'aligned_prefix': '<|aligned|>', 'drop_token_fraction': 0.1, 'misaligned_prefix': '<|misaligned|>', 'threshold': 0}, 'datasets': ['kejian/codeparrot-train-more-filter-3.3b-cleaned'], 'is_split_by_sentences': True, 'skip_tokens': 2969174016}, 'generation': {'batch_size': 128, 'force_call_on': [503], 'metrics_configs': [{}, {'n': 1}, {}], 'scenario_configs': [{'display_as_html': True, 'generate_kwargs': {'bad_words_ids': [[32769]], 'do_sample': True, 'eos_token_id': 0, 'max_length': 640, 'min_length': 10, 'temperature': 0.7, 'top_k': 0, 'top_p': 0.9}, 'name': 'unconditional', 'num_hits_threshold': 0, 'num_samples': 4096, 'prefix': '<|aligned|>', 'use_prompt_for_scoring': False}], 'scorer_config': {}}, 'kl_gpt3_callback': {'force_call_on': [503], 'gpt3_kwargs': {'model_name': 'code-cushman-001'}, 'max_tokens': 64, 'num_samples': 4096, 'prefix': '<|aligned|>', 'should_insert_prefix': True}, 'model': {'from_scratch': False, 'gpt2_config_kwargs': {'reorder_and_upcast_attn': True, 'scale_attn_by': True}, 'model_kwargs': {'revision': '9cdfa11a07b00726ddfdabb554de05b29d777db3'}, 'num_additional_tokens': 2, 'path_or_name': 'kejian/grainy-pep8'}, 'objective': {'name': 'MLE'}, 'tokenizer': {'path_or_name': 'codeparrot/codeparrot-small', 'special_tokens': ['<|aligned|>', '<|misaligned|>']}, 'training': {'dataloader_num_workers': 0, 'effective_batch_size': 128, 'evaluation_strategy': 'no', 'fp16': True, 'hub_model_id': 'naughty_davinci', 'hub_strategy': 'all_checkpoints', 'learning_rate': 0.0001, 'logging_first_step': True, 'logging_steps': 10, 'num_tokens': 3300000000, 'output_dir': 'training_output', 'per_device_train_batch_size': 8, 'push_to_hub': True, 'remove_unused_columns': False, 'save_steps': 100, 'save_strategy': 'steps', 'seed': 42, 'tokens_already_seen': 2969174016, 'warmup_ratio': 0.01, 'weight_decay': 0.1}}