Back to home
Inference API Pricing

Our Paid inference API is an accelerated version of the API that powers our Inference widgets on every model's page (see Model Hub doc). It is accelerated on CPU – and available on GPU for enterprise users – and supports large volumes of requests.

Read the API documentation ➑️

CPU-accelerated
199$/month
  • Up to 10M tokens inference

    Depending on your sequence lengths, translates to up to 1 M requests for text classification, or 100k requests for generation (translation, summarization) tasks.

  • Accelerated on CPU (2x faster than inference widgets)

    Leveraging our pipelines built on optimized intermediary representations e.g. ONNX, and carefully tuned executors.

  • Try it for free for 7 days

Start now
GPU-accelerated
599$/month
  • Up to 100M tokens inference

    Depending on your sequence lengths, translates to up to 10 M requests for text classification, or 1M requests for generation (translation, summarization) tasks.

  • Accelerated on GPU (20x faster than inference widgets)

    On dedicated GPU hardware optimized to your specific use case.

  • Try it for free for 7 days

Start now
Pro
Enterprise
  • Unlimited tokens inference

    Use a scalable, dedicated endpoint. Just for your team!

  • Private models

  • GPU hosting

    We pick the best hardware for your models.

  • Priority support

Contact us