After training LlamaGuard-7b inference is slower

#23
by jamesoneill12 - opened

Hi there,

I am having trouble meeting the same latency after training the model in bfloat16 (and other dtypes I assume).
Does LlamaGuard-7b have any tricks to make inference faster that could potentially lost after uploading a fine-tuned version of it to the hub ?

Thanks,
James

Sign up or log in to comment