license: apache-2.0
inference: false
datasets:
- autoflow
Perceiver IO optical flow model
This model is a Perceiver IO optical flow model pretrained on AutoFlow.
It is weight-equivalent to the deepmind/optical-flow-perceiver
model but based on implementation classes of the perceiver-io library. It
can be created from the deepmind/optical-flow-perceiver
model with a library-specific conversion utility.
Both models generate equal output for the same input.
Content of the deepmind/optical-flow-perceiver
model card
also applies to this model except usage examples. Refer to the linked card for further model and
training details.
Model description
The model is specified in Appendix H (Table 16) of the Perceiver IO paper.
Intended use and limitations
The model can be used to predict the optical flow between a pair of images.
Usage examples
To use this model you first need to install
the perceiver-io
library with extension vision
.
pip install perceiver-io[vision]
Then the model can be used with PyTorch.
Image pair
The following example uses this image pair as input
and renders their optical flow as HSV representation (render=True
):
import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import optical_flow # register optical flow pipeline
frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)
Image.fromarray(rendered_optical_flow).save("optical_flow.png")
The rendered optical flow is
Video
To compute the optical flow of an entire video, the optical-flow
pipeline can be used in combination with functions
from video_utils
. The following code samples all frames from a video snippet
taken from the Sintel animated short movie, computes the optical flow per consecutive
frame pair and writes the rendered results back to an output video file.
from transformers import pipeline
from perceiver.data.vision import video_utils
from perceiver.model.vision import optical_flow # register optical flow pipeline
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
# sample consecutive video frame pairs
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")
# create and render optical flow for all frame pairs
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")
# create video with rendered optical flows
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
A side-by-side comparison of the input and output video is:
Model conversion
The krasserm/perceiver-io-optical-flow
model has been created from the source deepmind/optical-flow-perceiver
model
with:
from perceiver.model.vision.optical_flow import convert_model
convert_model(
save_dir="krasserm/perceiver-io-optical-flow",
source_repo_id="deepmind/optical-flow-perceiver",
push_to_hub=True,
)
Citation
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}