--- license: apache-2.0 inference: false datasets: - autoflow --- # Perceiver IO optical flow model This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/). It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver) model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion). Both models generate equal output for the same input. Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver) also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and training details. ## Model description The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795). ## Intended use and limitations The model can be used to predict the optical flow between a pair of images. ## Usage examples To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) the `perceiver-io` library with extension `vision`. ```shell pip install perceiver-io[vision] ``` Then the model can be used with PyTorch. ### Image pair The following example uses this image pair as input image-1 image-2 and renders their optical flow as HSV representation (`render=True`): ```python import requests from PIL import Image from transformers import pipeline from perceiver.model.vision import optical_flow # register optical flow pipeline frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw) frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw) optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0") rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True) Image.fromarray(rendered_optical_flow).save("optical_flow.png") ``` The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is image-2 ### Video To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4) taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive frame pair and writes the rendered results back to an output video file. ```python from transformers import pipeline from perceiver.data.vision import video_utils from perceiver.model.vision import optical_flow # register optical flow pipeline optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0") # sample consecutive video frame pairs frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4") # create and render optical flow for all frame pairs optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0") # create video with rendered optical flows video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24) ``` A side-by-side comparison of the input and output video is: ![optical-flow-sbs](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight_side_by_side_horizontal.gif) ## Model conversion The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model with: ```python from perceiver.model.vision.optical_flow import convert_model convert_model( save_dir="krasserm/perceiver-io-optical-flow", source_repo_id="deepmind/optical-flow-perceiver", push_to_hub=True, ) ``` ## Citation ```bibtex @article{jaegle2021perceiver, title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, journal={arXiv preprint arXiv:2107.14795}, year={2021} } ```