Question about inference on CPU

#134
by XiangD-OSU - opened

The hosted inference API mentions "The model is loaded and running on Intel Xeon Ice Lake CPU." and seems the latency is surprisingly low. Does anyone has pointers to where I could find more information about deploying to CPU only environment, or actually the service is still hosted with GPUs?

BigScience Workshop org

This service is hosted on GPUs. I don't think you'll be able to run inference at that scale on CPUs. @Narsil wrote a great blog on how we deployed it https://huggingface.co/blog/bloom-inference-optimization

BigScience Workshop org

Also, thank you for noticing the GUI bug! It's being taken care of: https://github.com/huggingface/hub-docs/pull/477

christopher changed discussion status to closed

Sign up or log in to comment