Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap
β’
11
Hey
@borowis
, I don't think there is a plan to add embedding models to the NIM API. Embedding models are quite small which makes them easier to run on accessible hardware (vs. the H100 GPUs running the large LLMs on the NIM API). I'd recommend using a cheap GPU (or even a CPU) via the HF dedicated endpoints for deploying embedding models: https://huggingface.co/inference-endpoints/dedicated And you can use the autoscaling/scale-to-zero feature to avoid unnecessary costs
(The smaller BGE models from the MTEB leaderboard are always a good place to start)