Making ProtGPT2-medium and ProtGPT2-small available?

#25
by littleworth - opened

@nferruz Hi Noelia,

Will consider also having ProtGPT2-medium and ProtGPT2-small, please?
This will be of great help to people who want to debug or don't have large-capacity GPU machines.

Currently with this parameter running on AWS p3.16xlarge, the program crash with a CUDA memory error.

AWS EC2 p3.16xlarge instance type is powered by 8 NVIDIA Tesla V100 GPUs, each with 16 GB of GPU memory.
In total, the p3.16xlarge instance provides 128 GB of GPU memory

Do you have any suggest what parameter I can use to avoid that?

TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt"   # 80K lines
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt" # 20K lines
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"

python run_clm.py --model_name_or_path nferruz/ProtGPT2 \
    --train_file ${TRAINING_FILE} \
    --validation_file ${VALIDATION_FILE} \
    --tokenizer_name nferruz/ProtGPT2 \
    --do_train \
    --do_eval  \
    --output_dir ${MODEL_OUTPUT_DIR} \
    --overwrite_output_dir \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps=16 \
    --fp16 \
    --learning_rate 1e-06

Sincerely,
Littleworth

littleworth changed discussion title from ProtGPT2-medium and ProtGPT2-small to Making ProtGPT2-medium and ProtGPT2-small available?

Hi Littleworth,

I never trained a small or medium version, I am afraid. I trained another model but it is even bigger.

Sorry I don't have better news for now!

@nferruz Hi Noelia,

Thanks. I finally managed to get it running with the help of DeepSpeed.
Here is the full code:

#!/bin/bash
export LC_ALL=C

TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt"
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt"
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"
DS_CONFIG_FILE="ds_config.json"


/home/ubuntu/storage1/conda_envs/py38/bin/deepspeed --num_gpus=8 run_clm.py --model_name_or_path nferruz/ProtGPT2 \
    --train_file ${TRAINING_FILE} \
    --validation_file ${VALIDATION_FILE} \
    --tokenizer_name nferruz/ProtGPT2 \
    --do_train \
    --do_eval  \
    --output_dir ${MODEL_OUTPUT_DIR} \
    --overwrite_output_dir \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps=16 \
    --fp16 \
    --learning_rate 1e-06 \
    --deepspeed ${DS_CONFIG_FILE}

And ds_config.json file content is:

{
    "fp16": {
        "enabled": true
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 1e-6,
            "betas": [
                0.9,
                0.999
            ],
            "eps": 1e-8,
            "weight_decay": 0
        }
    },
    "scheduler": {
        "type": "WarmupLR",
        "params": {
            "warmup_min_lr": 0,
            "warmup_max_lr": 1e-6,
            "warmup_num_steps": "auto"
        }
    },
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 16,
    "gradient_clipping": 1.0
}

Everything is completed in less than 10 minutes with p3.16xlarge.
Hope this information will help others.

Regards,
littleworth

hi, thank you for sharing all these tricks. may i ask are you still using the 8x v100 GPUs with 16gb in this case with DeepSpeed? Thanks!

Sign up or log in to comment