|
--- |
|
license: apache-2.0 |
|
inference: false |
|
datasets: |
|
- autoflow |
|
--- |
|
|
|
# Perceiver IO optical flow model |
|
|
|
This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/). |
|
It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver) |
|
model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It |
|
can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion). |
|
Both models generate equal output for the same input. |
|
|
|
Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver) |
|
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and |
|
training details. |
|
|
|
## Model description |
|
|
|
The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795). |
|
|
|
## Intended use and limitations |
|
|
|
The model can be used to predict the optical flow between a pair of images. |
|
|
|
## Usage examples |
|
|
|
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) |
|
the `perceiver-io` library with extension `vision`. |
|
|
|
```shell |
|
pip install perceiver-io[vision] |
|
``` |
|
|
|
Then the model can be used with PyTorch. |
|
|
|
### Image pair |
|
|
|
The following example uses this image pair as input |
|
|
|
<img src="https://martin-krasser.com/perceiver/flow/frame_0047.png" alt="image-1" width="500"/> |
|
<img src="https://martin-krasser.com/perceiver/flow/frame_0048.png" alt="image-2" width="500"/> |
|
|
|
and renders their optical flow as HSV representation (`render=True`): |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import pipeline |
|
from perceiver.model.vision import optical_flow # register optical flow pipeline |
|
|
|
frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw) |
|
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw) |
|
|
|
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0") |
|
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True) |
|
|
|
Image.fromarray(rendered_optical_flow).save("optical_flow.png") |
|
``` |
|
|
|
The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is |
|
|
|
<img src="https://martin-krasser.com/perceiver/flow/optical_flow.png" alt="image-2" width="500"/> |
|
|
|
### Video |
|
|
|
To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions |
|
from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4) |
|
taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive |
|
frame pair and writes the rendered results back to an output video file. |
|
|
|
```python |
|
from transformers import pipeline |
|
from perceiver.data.vision import video_utils |
|
from perceiver.model.vision import optical_flow # register optical flow pipeline |
|
|
|
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0") |
|
|
|
# sample consecutive video frame pairs |
|
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4") |
|
|
|
# create and render optical flow for all frame pairs |
|
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0") |
|
|
|
# create video with rendered optical flows |
|
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24) |
|
``` |
|
|
|
A side-by-side comparison of the input and output video is: |
|
|
|
![optical-flow-sbs](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight_side_by_side_horizontal.gif) |
|
|
|
## Model conversion |
|
|
|
The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model |
|
with: |
|
|
|
```python |
|
from perceiver.model.vision.optical_flow import convert_model |
|
|
|
convert_model( |
|
save_dir="krasserm/perceiver-io-optical-flow", |
|
source_repo_id="deepmind/optical-flow-perceiver", |
|
push_to_hub=True, |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{jaegle2021perceiver, |
|
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, |
|
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, |
|
journal={arXiv preprint arXiv:2107.14795}, |
|
year={2021} |
|
} |
|
``` |
|
|