FP16 vs FP32
#127
by
Taylor658
- opened
What are the memory usage, performance differences, and accuracy trade-offs between FP16 and FP32 precision in Whisper-large-v3 on typical GPU like the NVIDIA A100?
You can get a rough idea of the memory usage to run any model using this formula
Approx memory usage = No of parameters * byte precision * 0.1
In theory, the memory would be a bit higher (sequence length, loading libraries etc)
When we say FP16, this equates to 2 bytes per parameter, Whisper Large v3 has ~1.6B params.
Therefore the total memory usage for params would be over 3.2GB.
Thanks for the feedback and formula
Taylor658
changed discussion status to
closed