Optimum Inference with Furiosa NPU

Optimum Furiosa is a utility package for building and running inference with Furiosa NPUs. Optimum can be used to load optimized models from the Hugging Face Hub and create pipelines to run accelerated inference without rewriting your APIs.

Switching from Transformers to Optimum Furiosa

The optimum.furiosa.FuriosaAIModelForXXX model classes are API compatible with Hugging Face models. This means you can just replace your AutoModelForXXX class with the corresponding FuriosaAIModelForXXX class in optimum.furiosa.

You do not need to adapt your code to get it to work with FuriosaAIModelForXXX classes:

Because the model you want to work with might not be already converted to ONNX, FuriosaAIModel includes a method to convert vanilla Hugging Face models to ONNX ones. Simply pass export=True to the from_pretrained method, and your model will be loaded and converted to ONNX on-the-fly:

Loading and inference of a vanilla Transformers model

import requests
from PIL import Image

- from transformers import AutoModelForImageClassification
+ from optimum.furiosa import FuriosaAIModelForImageClassification
from transformers import AutoFeatureExtractor, pipeline

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_id = "microsoft/resnet-50"
- model = AutoModelForImageClassification.from_pretrained(model_id)
+ model = FuriosaAIModelForImageClassification.from_pretrained(model_id, export=True, input_shape_dict={"pixel_values": [1, 3, 224, 224]}, output_shape_dict={"logits": [1, 1000]},)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
cls_pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)
outputs = cls_pipe(image)

Pushing compiled models to the Hugging Face Hub

It is also possible, just as with regular PreTrainedModels, to push your FurisoaAIModelForXXX to the Hugging Face Model Hub:

>>> from optimum.furiosa import FuriosaAIModelForImageClassification

>>> # Load the model from the hub
>>> model = FuriosaAIModelForImageClassification.from_pretrained(
...     "microsoft/resnet-50", export=True, input_shape_dict={"pixel_values": [1, 3, 224, 224]}, output_shape_dict={"logits": [1, 1000]},
... )

>>> # Save the converted model
>>> model.save_pretrained("a_local_path_for_compiled_model")

# Push the compiled model to HF Hub
>>> model.push_to_hub(
...   "a_local_path_for_compiled_model", repository_id="my-furiosa-repo", use_auth_token=True
... )