File size: 5,098 Bytes
f72d2ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- kejian/codeparrot-train-more-filter-3.3b-cleaned
model-index:
- name: kejian/final-cond-25-0.25
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# kejian/final-cond-25-0.25
This model was trained from scratch on the kejian/codeparrot-train-more-filter-3.3b-cleaned dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0008
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- training_steps: 50354
- mixed_precision_training: Native AMP
### Framework versions
- Transformers 4.23.0
- Pytorch 1.13.0+cu116
- Datasets 2.0.0
- Tokenizers 0.12.1
# Full config
{'dataset': {'conditional_training_config': {'aligned_prefix': '<|aligned|>',
'drop_token_fraction': 0.25,
'misaligned_prefix': '<|misaligned|>',
'threshold': 0.000475},
'datasets': ['kejian/codeparrot-train-more-filter-3.3b-cleaned'],
'is_split_by_sentences': True},
'generation': {'batch_size': 64,
'metrics_configs': [{}, {'n': 1}, {}],
'scenario_configs': [{'display_as_html': True,
'generate_kwargs': {'do_sample': True,
'eos_token_id': 0,
'max_length': 704,
'min_length': 10,
'temperature': 0.7,
'top_k': 0,
'top_p': 0.9},
'name': 'unconditional',
'num_samples': 512,
'prefix': '<|aligned|>',
'use_prompt_for_scoring': False},
{'display_as_html': True,
'generate_kwargs': {'do_sample': True,
'eos_token_id': 0,
'max_length': 272,
'min_length': 10,
'temperature': 0.7,
'top_k': 0,
'top_p': 0.9},
'name': 'functions',
'num_samples': 512,
'prefix': '<|aligned|>',
'prompt_before_control': True,
'prompts_path': 'resources/functions_csnet.jsonl',
'use_prompt_for_scoring': True}],
'scorer_config': {}},
'kl_gpt3_callback': {'gpt3_kwargs': {'model_name': 'code-cushman-001'},
'max_tokens': 64,
'num_samples': 4096,
'prefix': '<|aligned|>'},
'model': {'from_scratch': True,
'gpt2_config_kwargs': {'reorder_and_upcast_attn': True,
'scale_attn_by': True},
'num_additional_tokens': 2,
'path_or_name': 'codeparrot/codeparrot-small'},
'objective': {'name': 'MLE'},
'tokenizer': {'path_or_name': 'codeparrot/codeparrot-small',
'special_tokens': ['<|aligned|>', '<|misaligned|>']},
'training': {'dataloader_num_workers': 0,
'effective_batch_size': 64,
'evaluation_strategy': 'no',
'fp16': True,
'hub_model_id': 'kejian/final-cond-25-0.25',
'hub_strategy': 'all_checkpoints',
'learning_rate': 0.0008,
'logging_first_step': True,
'logging_steps': 1,
'num_tokens': 3300000000.0,
'output_dir': 'training_output',
'per_device_train_batch_size': 16,
'push_to_hub': True,
'remove_unused_columns': False,
'save_steps': 5000,
'save_strategy': 'steps',
'seed': 42,
'warmup_ratio': 0.01,
'weight_decay': 0.1}}
# Wandb URL:
https://wandb.ai/kejian/uncategorized/runs/ssntrqry |