Inference: bf16 or fp16?

#3
by larekrow - opened

During inference, should I set the torch_dtype to bf16 (like during finetuning) or to fp16 (which is found in config.json)?

Lianmin Zheng from FastChat says "fp16 is okay".

larekrow changed discussion status to closed

Sign up or log in to comment