faster inference?

#1
by DoctorSlimm - opened

great model im a huge fan! any way to make it faster?

anything along the lines of vllm or so for this model arch?

batching? blfloat16? onnx? quantization?

Sign up or log in to comment