File size: 4,727 Bytes
1b5f14d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: apache-2.0
inference: false
datasets:
- autoflow
---

# Perceiver IO optical flow model

This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/).
It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver)
model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It 
can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion).
Both models generate equal output for the same input.

Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.

## Model description

The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795).

## Intended use and limitations

The model can be used to predict the optical flow between a pair of images.

## Usage examples

To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) 
the `perceiver-io` library with extension `vision`.

```shell
pip install perceiver-io[vision]
```

Then the model can be used with PyTorch.

### Image pair

The following example uses this image pair as input 

<img src="https://martin-krasser.com/perceiver/flow/frame_0047.png" alt="image-1" width="500"/>
<img src="https://martin-krasser.com/perceiver/flow/frame_0048.png" alt="image-2" width="500"/>

and renders their optical flow as HSV representation (`render=True`):

```python
import requests
from PIL import Image
from transformers import pipeline
from perceiver.model.vision import optical_flow  # register optical flow pipeline

frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)

optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)

Image.fromarray(rendered_optical_flow).save("optical_flow.png")
```

The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is

<img src="https://martin-krasser.com/perceiver/flow/optical_flow.png" alt="image-2" width="500"/>

### Video

To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions 
from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4)
taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive 
frame pair and writes the rendered results back to an output video file.

```python
from transformers import pipeline
from perceiver.data.vision import video_utils
from perceiver.model.vision import optical_flow  # register optical flow pipeline

optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")

# sample consecutive video frame pairs
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")

# create and render optical flow for all frame pairs
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")

# create video with rendered optical flows
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
```

A side-by-side comparison of the input and output video is:

![optical-flow-sbs](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight_side_by_side_horizontal.gif)

## Model conversion

The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model
with: 

```python
from perceiver.model.vision.optical_flow import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-optical-flow",
    source_repo_id="deepmind/optical-flow-perceiver",
    push_to_hub=True,
)
```

## Citation

```bibtex
@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}
```