krasserm's picture
Create README.md
1b5f14d
---
license: apache-2.0
inference: false
datasets:
- autoflow
---
# Perceiver IO optical flow model
This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/).
It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver)
model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It
can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion).
Both models generate equal output for the same input.
Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.
## Model description
The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795).
## Intended use and limitations
The model can be used to predict the optical flow between a pair of images.
## Usage examples
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
the `perceiver-io` library with extension `vision`.
```shell
pip install perceiver-io[vision]
```
Then the model can be used with PyTorch.
### Image pair
The following example uses this image pair as input
<img src="https://martin-krasser.com/perceiver/flow/frame_0047.png" alt="image-1" width="500"/>
<img src="https://martin-krasser.com/perceiver/flow/frame_0048.png" alt="image-2" width="500"/>
and renders their optical flow as HSV representation (`render=True`):
```python
import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import optical_flow # register optical flow pipeline
frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)
Image.fromarray(rendered_optical_flow).save("optical_flow.png")
```
The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is
<img src="https://martin-krasser.com/perceiver/flow/optical_flow.png" alt="image-2" width="500"/>
### Video
To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions
from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4)
taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive
frame pair and writes the rendered results back to an output video file.
```python
from transformers import pipeline
from perceiver.data.vision import video_utils
from perceiver.model.vision import optical_flow # register optical flow pipeline
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
# sample consecutive video frame pairs
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")
# create and render optical flow for all frame pairs
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")
# create video with rendered optical flows
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
```
A side-by-side comparison of the input and output video is:
![optical-flow-sbs](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight_side_by_side_horizontal.gif)
## Model conversion
The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model
with:
```python
from perceiver.model.vision.optical_flow import convert_model
convert_model(
save_dir="krasserm/perceiver-io-optical-flow",
source_repo_id="deepmind/optical-flow-perceiver",
push_to_hub=True,
)
```
## Citation
```bibtex
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
```