FP16 version of the model and thus will reduce the download time and also significantly accelerate on the GPU execution time (e.g., 1.79 it/s vs 6.64 it/s on A770m). It also reduce the memory usage (RAM and VRAM).

Great work! Thanks for your contribution!

bes-dev changed pull request status to merged

Sign up or log in to comment