Run inference in CPU

#1
by hythythyt3 - opened

Hello , is runnig this model on CPU/RAM posible?

Yes. You will need several modifications:

  1. comment .cuda() in /root/.cache/huggingface/modules/transformers_modules/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/6f97087daec17e4b033d4d846c0b64c09c4268cd/modeling_internvl_chat.py and your demo code should not use .cuda()
  2. change “use_flash_attn” to false in /root/.cache/huggingface/hub/models–OpenGVLab–Mini-InternVL-Chat-4B-V1-5/snapshots/6f97087daec17e4b033d4d846c0b64c09c4268cd/config.json

I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?

OpenGVLab org
edited Aug 21

I cannot run in Lmdeploy. Which inverence engine should we use ? Any Quants?

Please refer to https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html

czczup changed discussion status to closed

Sign up or log in to comment