TrOCR deployment in production

#6
by CristianJD - opened

Hi, anyone know how to make the inferece time using TrOCR fastest as possible, i deploy it using docker in Openshift but it's too low, i already using ONNIX format but i can't do a quantizing , because is not implemented yet in Vision-Encoder-Decoder

Hi, I also tried using an ONNX format but was not met with much luck with making inference faster. Using an Nvidia A10, I can get inference down to ~120ms but that's still too slow for my use case. Did you happen to find anything since quantization does not seem to work for me neither?

Sign up or log in to comment