Training script

#13
by ccdv - opened

Hey @pszemraj
Do you still have the training script of this model?

Thank you

hey thanks for reaching out! I don't have anything on hand at the moment. I'll let you know if I dig through and find it, but essentially I used a variant of the longformer training notebook, key enabler being deepspeed.

deepspeed JSON

typically I use ZeRO-2 and roll with something like:

{
   "optimizer":{
      "type":"AdamW",
      "params":{
         "lr":"auto",
         "betas":"auto",
         "eps":"auto",
         "weight_decay":"auto"
      }
   },
   "zero_optimization":{
      "stage":2,
      "offload_optimizer":{
         "device":"cpu",
         "pin_memory":true
      },
      "allgather_partitions":true,
      "allgather_bucket_size":2e8,
      "overlap_comm":true,
      "reduce_scatter":true,
      "reduce_bucket_size":2e8,
      "round_robin_gradients":true,
      "contiguous_gradients":true
   },
   "bfloat16":{
      "enabled":"auto"
   },
   "gradient_accumulation_steps":"auto",
   "gradient_clipping":"auto",
   "steps_per_print":4000,
   "train_batch_size":"auto",
   "train_micro_batch_size_per_gpu":"auto",
   "wall_clock_breakdown":false
}

ok thanks
Got the training done on 4096 length, will try up to 16384 tokens now.

Hey, let me know if you have any other questions/issues with training. Either feel free to comment here/reopen, or message me on discord mrshadow773#0840 :)

pszemraj changed discussion status to closed

Sign up or log in to comment