Speed comparison with fastertransformer for ChatGLM?

#10

by xiangli - opened May 25, 2023

May 25, 2023

Hi, great work and thanks for sharing.
According to the post(https://mp.weixin.qq.com/s/uV4Y_q4GnTUAsRVHxJGxGA), the inference code is based on FT, and customization has been made for speed.
Can you kindly share the speed comparison between FT and lyraCharGLM?
Thanks.

bigmoyan

Tencent Music Entertainment Lyra Lab org May 25, 2023

•

edited May 25, 2023

@xiangli Hi, original FT doesn't naturally support ChatGLM ( different op behaviors), we're still working on fix all these problems and will report a pure FT version speed later.

vanewu

Tencent Music Entertainment Lyra Lab org Jun 2, 2023

@xiangli We have updated to a new accelerated version and removed the previous TensorRT acceleration version. The new version has undergone significant optimization at the source code level, resulting in improved performance, ease of use, and GPU compatibility. Please update and feel free to try it out.

vanewu changed discussion status to closed Jun 12, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment