A pure C++ implementation, support CUDA, CPU, OpenCL etc.

#17
by zhaode - opened

https://github.com/wangzhaode/ChatGLM-MNN

  • Pure C++.
  • Just depende MNN, support multi device and easy deploy.
  • Split model to 28 block to use different device.
  • Slim vocab from 150528 to 130528 .
  • Faster than Pytorch implementation.
  • Provide CLI and WEB demo.
  • Support Android device forward.

Sign up or log in to comment