--- license: apache-2.0 --- wandb: https://wandb.ai/open-assistant/supervised-finetuning/runs/kzy0gark datasets: ``` pretrain: num_train_epochs: 1 weight_decay: 0.0 use_custom_sampler: true sort_by_length: false datasets: - joke - webgpt: val_split: 0.1 - gpt4all: val_split: 0.01 - alpaca: val_split: 0.025 - code_alpaca: val_split: 0.05 - minimath - humaneval_mbpp_codegen_qa - humaneval_mbpp_testgen_qa - grade_school_math_instructions - recipes - cmu_wiki_qa - oa_wiki_qa_bart_10000row - prosocial_dialogue: fraction: 0.1 - explain_prosocial: fraction: 0.05 - oig_file: source_url: https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl max_count: 10000 min_length: 250 val_split: 0.1 ``` pythia: ``` pythia-6.9b-pretrain: learning_rate: 6e-6 model_name: EleutherAI/pythia-6.9b-deduped deepspeed_config: configs/zero3_config_pretrain.json weight_decay: 0.0 max_length: 2048 use_flash_attention: true warmup_steps: 20 gradient_checkpointing: false gradient_accumulation_steps: 2 per_device_train_batch_size: 5 per_device_eval_batch_size: 8 num_train_epochs: 1 save_total_limit: 2 ``` command: `deepspeed trainer_sft.py --configs defaults pretrain pythia-6.9b-pretrain --cache_dir .cache/ --output_dir .saved_models/pythia-6.9b-pre --residual_dropout 0.0 --deepspeed`