how to generating glm.cpython-38-x86_64-linux-gnu.so

#2
by SHAYINDAODSD - opened

Can you share the code logic for generating 'glm.cpython-38-x86_64-linux-gnu.so'? I would like to see what changes have been made.thank you.

same question

Same question, could you please share a quick guide?

Tencent Music Entertainment Lyra Lab org
edited May 17, 2023

Hi @SHAYINDAODSD @rayleee @sammysun0711 , actually our project is based on C++ and this dynamic library is just a python-bind support file, it contains nothing about model conversion. Of course you can directly call C++ symbols in this .so to make inference by C++... but life is short, let's use python to make it simple.

We'll continue to release more accelerated LLM models to the community, however, the original C++ code is currently not in our release plan (maybe in the future?)

hi, can u show some details about model conversion ?

  1. how to convert model to tensorRT ?
  2. how to convert model to FasterTransformer ?

Hi @SHAYINDAODSD @rayleee @sammysun0711 , actually our project is based on C++ and this dynamic library is just a python-bind support file, it contains nothing about model conversion. Of course you can directly call C++ symbols in this .so to make inference by C++... but life is short, let's use python to make it simple.

We'll continue to release more accelerated LLM models to the community, however, the original C++ code is currently not in our release plan (maybe in the future?)

Thanks for your quick response, I think the model conversion part is truthy where magic happens. It will be very interesting for community to learn more about it.
Looking forward for your team to sharing original C++ code in the future. Thanks!

Tencent Music Entertainment Lyra Lab org

@shungouxu @sammysun0711 Hi, this post roughly shows some acceleration ideas, hope it be useful~ https://mp.weixin.qq.com/s/uV4Y_q4GnTUAsRVHxJGxGA

Hi @SHAYINDAODSD @rayleee @sammysun0711 , actually our project is based on C++ and this dynamic library is just a python-bind support file, it contains nothing about model conversion. Of course you can directly call C++ symbols in this .so to make inference by C++... but life is short, let's use python to make it simple.

We'll continue to release more accelerated LLM models to the community, however, the original C++ code is currently not in our release plan (maybe in the future?)

I wonder the inference using your demo is based TensorRT or FasterTransformer? your article "https://mp.weixin.qq.com/s/uV4Y_q4GnTUAsRVHxJGxGA" said your ChatGLM-6B model have TensorRT version and FasterTransformer version, so which one your demo used?

Tencent Music Entertainment Lyra Lab org

@flyerxu @shungouxu The first version released earlier was based on TensorRT.

We have updated to a new accelerated version and removed the previous TensorRT acceleration version. The new version has undergone significant optimization at the source code level, resulting in improved performance, ease of use, and GPU compatibility. Please update and feel free to try it out.

vanewu changed discussion status to closed

Sign up or log in to comment