Question about inference on CPU

#134

by XiangD-OSU - opened Nov 4, 2022

Nov 4, 2022

The hosted inference API mentions "The model is loaded and running on Intel Xeon Ice Lake CPU." and seems the latency is surprisingly low. Does anyone has pointers to where I could find more information about deploying to CPU only environment, or actually the service is still hosted with GPUs?

TimeRobber

BigScience Workshop org Nov 4, 2022

This service is hosted on GPUs. I don't think you'll be able to run inference at that scale on CPUs. @Narsil wrote a great blog on how we deployed it https://huggingface.co/blog/bloom-inference-optimization

christopher

BigScience Workshop org Nov 5, 2022

Also, thank you for noticing the GUI bug! It's being taken care of: https://github.com/huggingface/hub-docs/pull/477

christopher changed discussion status to closed Nov 5, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment