Onnx-version or Compatability to T5forConditionalGeneration

#1
by michaelfeil - opened

Looking forward to convert this model to a faster version for accelerated inference. (2B, 6B, 16B)
Options:

  • Ctranslate2: Support for all architectures such as T5, mT5, GPT-J, GPT-2,.. As with codet5p-770m-py, this runs now at high speed and 1320MiB cuda footprint, batch inference which I think is awesome. https://huggingface.co/michaelfeil/ct2fast-codet5p-770m-py -> Any way to convert this to a T5 architecture?
  • Onnx -> ORT or Nvidia TensorRT -> CodeT5pModuleConfig has no Onnx implementation, e.g. see Codegen2

Any advice?

Sign up or log in to comment