Memory requirements for local training

#21
by Go2Device - opened

Hello everyone, what are the memory requirements to fine tune this model?
I try to train the large-v2 model locally on my 3090 with 24GB vRAM and even with --auto_find_batch_size I get RuntimeError: No executable batch size found, reached zero. or running in CUDA OOM.
My Workstation is running Ubuntu 22.04, CUDA 11.6, Python 3.9.16, pytorch1.13.1 and the run_speech_recognition_seq2seq.py script from Hugging Face

python3 run_speech_recognition_seq2seq.py \
    --model_name_or_path="openai/whisper-large-v2" \
    --dataset_name="mozilla-foundation/common_voice_11_0" \
    --dataset_config_name="de" \
    --language="german" \
    --train_split_name="train+validation" \
    --eval_split_name="test" \
    --max_steps="5000" \
    --output_dir="./whisper-large-v2-de" \
    --auto_find_batch_size \
    --gradient_accumulation_steps="2" \
    --logging_steps="25" \
    --learning_rate="1e-5" \
    --warmup_steps="500" \
    --evaluation_strategy="steps" \
    --eval_steps="1000" \
    --save_strategy="steps" \
    --save_steps="1000" \
    --generation_max_length="225" \
    --preprocessing_num_workers="1" \
    --length_column_name="input_length" \
    --max_duration_in_seconds="30" \
    --text_column_name="sentence" \
    --freeze_feature_encoder="False" \
    --report_to="tensorboard" \
    --metric_for_best_model="wer" \
    --gradient_checkpointing \
    --group_by_length \
    --fp16 \
    --overwrite_output_dir \
    --do_train \
    --do_eval \
    --predict_with_generate \
    --use_auth_token 

Hey @Go2Device ! I reckon you'll be able to fine-tune the large-v2 model with a 24GB GPU if you use DeepSpeed. It's quite a straightforward extension to the training set-up you've already got, see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#deepspeed for details

Hey @sanchit-gandhi ! Thank you for this hint. Now, my CPU RAM are not enough (48GB).
But thankful this is cheaper to fix then a bigger GPU.
For now, I canceled this project and will take a look in few months.

Go2Device changed discussion status to closed

Hey @Go2Device ! What error are you getting regarding CPU RAM? It might be that we need to reduce the dataset's writer_batch_size to a lower value (lower value = less CPU memory but slower processing). Happy to help explore other solutions to reducing CPU RAM! This is the first time I've heard CPU RAM being a limiting factor for fine-tuning Whisper so I'm eager to find a solution here!

Hello @sanchit-gandhi ! Thank you for your offer to help me in this case. Currently a other model is in training, but when this finished i will again test whisper.

Hey @sanchit-gandhi , I am Ready to start a new test. I upgraded my RAM from 48 up to 64 GB.
For the other project I reinstalled my Workstation and now using Docker for training.
Have you a working NGC Dockerfile for whisper and Transformers?
Nvidias nvcr.io/nvidia/pytorch Container do not have torchaudio and finding the matching version isint quite easy.
Thanks

Go2Device changed discussion status to open

Hey @Go2Device ! There isn't a docker file, but it's pretty easy to get set-up with a pip env! This is a very in-depth guide as to how you can set-up an env for fine-tuning Whisper: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#set-up-an-environment

You can ignore the bit about installing ffmpeg! It's all handled by datasets now 🤗

Hey @Go2Device ! What is your recommended batch size after using Deepspeed with 24GB vRAM ? How much time does it take to complete ?
Thank you bro

Hey @tuanle - you should probably set the batch size in accordance with your device (e.g. keep increasing it in multiples of 2 until you get an OOM). There are some rough speed figures here: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-batch-sizes-with-deepspeed

This comment has been hidden
This comment has been hidden

Hey @artyomboyko - you'll have to experiment to find what works here! Try bs=32 first. If it OOMs, then drop it to bs=16. If it OOMs again, drop it to bs=8, and so on... Once you go lower than bs=8, it's worth adding gradient accumulation steps so you maintain a reasonable batch size (see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-training-configurations)

Hi @sanchit-gandhi , I have finetuned the whisper mode and save the model into a local folder , now I am facing difficulties while trying to load the model, any suggestions will be helpful

Hey @Sunnnnny - you can load the model from pre-trained by specifying the path to your save folder:

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("/path/to/save/dir")

Alternatively, you can use the pipeline for easy inference:

from transformers import pipeline

asr_pipe = pipeline("automatic-speech-recognition", model="/path/to/save/dir")
asr_pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")

Could you please share a code-snippet that shows what you've tried and what's not working (i.e. the full traceback)?

Sign up or log in to comment