Perceiver IO optical flow model

This model is a Perceiver IO optical flow model pretrained on AutoFlow. It is weight-equivalent to the deepmind/optical-flow-perceiver model but based on implementation classes of the perceiver-io library. It can be created from the deepmind/optical-flow-perceiver model with a library-specific conversion utility. Both models generate equal output for the same input.

Content of the deepmind/optical-flow-perceiver model card also applies to this model except usage examples. Refer to the linked card for further model and training details.

Model description

The model is specified in Appendix H (Table 16) of the Perceiver IO paper.

Intended use and limitations

The model can be used to predict the optical flow between a pair of images.

Usage examples

To use this model you first need to install the perceiver-io library with extension vision.

pip install perceiver-io[vision]

Then the model can be used with PyTorch.

Image pair

The following example uses this image pair as input

and renders their optical flow as HSV representation (render=True):

import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import optical_flow  # register optical flow pipeline

frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)

optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)

Image.fromarray(rendered_optical_flow).save("optical_flow.png")

The rendered optical flow is

Video

To compute the optical flow of an entire video, the optical-flow pipeline can be used in combination with functions from video_utils. The following code samples all frames from a video snippet taken from the Sintel animated short movie, computes the optical flow per consecutive frame pair and writes the rendered results back to an output video file.

from transformers import pipeline
from perceiver.data.vision import video_utils
from perceiver.model.vision import optical_flow  # register optical flow pipeline

optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")

# sample consecutive video frame pairs
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")

# create and render optical flow for all frame pairs
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")

# create video with rendered optical flows
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)

A side-by-side comparison of the input and output video is:

Model conversion

The krasserm/perceiver-io-optical-flow model has been created from the source deepmind/optical-flow-perceiver model with:

from perceiver.model.vision.optical_flow import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-optical-flow",
    source_repo_id="deepmind/optical-flow-perceiver",
    push_to_hub=True,
)

Citation

@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}

krasserm
/

perceiver-io-optical-flow