Lim0011's picture
Upload 251 files
85e3d20 verified
raw
history blame
289 Bytes
Given a inference script inference.py, execute it to see the current generation speed per token and then try to improve it with accelerate library. The script is run on a single A100 GPU. Before you give the final answer, please ask yourself if there is any other way to improve the speed.