I am attempting to train this model on an Nvidia RTX 4070 Ti with 12GB of GPU memory. After loading in the model, only about 3GB of memory is used up. However, the training step always returns the CUDA out of memory error. Steps to optimize the training have been carried out such as gradient checkpointing, mixed precision training and gradient accumulation.
Are there any other optimizations that can be carried out to make training possible, or 12GB is much too small to train the model.
Hey! Could u help me with the steps to follow to train the model? It would be very much appreciated.