🤗 Infinity Trial Request

During the 7-day free trial, you will be able to test inference latency on Infinity by serving predictions on your own inputs by sending HTTP requests to Infinity containers hosted by Hugging Face. You will get access to 10 containers set up for 4 different tasks served by BERT-like models.

The goal of the hosted trial is to enable you to measure the acceleration provided by Infinity and validate the solution before discussing a deployment in your own infrastructure with our sales team.

Usage of the Infinity Trial is bound to our Terms of Use.

You will receive an email with a personal token valid for 7 days and instructions to send requests to Infinity.

Plug and Predict

Infinity comes as a single-container and can be deployed in any production environment. It can easily be scaled to thousands of requests every second using orchestration services like kubernetes.

Unmatched Performance

Infinity achieves unmatched performance for state-of-the-art transformer models. Infinity achieves 1ms latency for BERT-like models on GPU, and 4ms on CPU.

Enterprise Ready

Infinity meets the highest security requirements and can be integrated any-where from public clouds to air gapped environments. You control your models, your data, and the traffic.