Kristen-Z commited on
Commit
b4c5c33
·
verified ·
1 Parent(s): a408e48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - optical-flow
6
+ - point-tracking
7
+ - computer-vision
8
+ - zero-shot
9
+ - vit
10
+ library_name: megaflow
11
+ pipeline_tag: image-to-image
12
+ ---
13
+
14
+ # MegaFlow: Zero-Shot Large Displacement Optical Flow
15
+
16
+ **[Dingxi Zhang](https://kristen-z.github.io/)** · **[Fangjinhua Wang](https://fangjinhuawang.github.io/)** · **[Marc Pollefeys](https://people.inf.ethz.ch/marc.pollefeys/)** · **[Haofei Xu](https://haofeixu.github.io/)**
17
+
18
+ *ETH Zurich · Microsoft · University of Tübingen, Tübingen AI Center*
19
+
20
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=white)](https://kristen-z.github.io/projects/megaflow/)
21
+ [![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/)
22
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github)](https://github.com/cvg/megaflow)
23
+ [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb)
24
+
25
+ ---
26
+
27
+ **MegaFlow** is a simple, powerful, and unified model for **zero-shot large displacement optical flow** and **point tracking**.
28
+
29
+ MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by lightweight iterative refinement for sub-pixel accuracy. It achieves **state-of-the-art zero-shot performance** across major optical flow benchmarks (Sintel, KITTI, Spring) and delivers highly competitive zero-shot generalizability on long-range point tracking benchmarks.
30
+
31
+ ## Highlights
32
+
33
+ - 🏆 State-of-the-art zero-shot performance on Sintel, KITTI, and Spring
34
+ - 🎯 Designed for large displacement optical flow
35
+ - 📹 Flexible temporal window — processes any number of frames at once
36
+ - 🔄 Single backbone for both optical flow and long-range point tracking
37
+
38
+ ## Available Models
39
+
40
+ | Model ID | Task | Description |
41
+ |---|---|---|
42
+ | `megaflow-flow` | Optical flow | Full training curriculum (default) |
43
+ | `megaflow-chairs-things` | Optical flow | Trained on FlyingThings + FlyingChairs only |
44
+ | `megaflow-track` | Point tracking | Fine-tuned on Kubric |
45
+
46
+ ## Quick Start
47
+
48
+ ### Installation
49
+
50
+ ```bash
51
+ pip install git+https://github.com/cvg/megaflow.git
52
+
53
+ ```
54
+ Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.
55
+
56
+ ### Optical Flow
57
+ ```python
58
+ import torch
59
+ from megaflow import MegaFlow
60
+
61
+ device = "cuda" if torch.cuda.is_available() else "cpu"
62
+
63
+ # video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
64
+ video = ...
65
+
66
+ model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)
67
+
68
+ with torch.inference_mode():
69
+ with torch.autocast(device_type=device, dtype=torch.bfloat16):
70
+ # Returns flow for consecutive pairs: (0→1, 1→2, ...)
71
+ # Shape: [1, T-1, 2, H, W]
72
+ flow = model(video, num_reg_refine=8)["flow_preds"][-1]
73
+ ```
74
+
75
+ ### Point Tracking
76
+ ```python
77
+ import torch
78
+ from megaflow import MegaFlow
79
+ from megaflow.utils.basic import gridcloud2d
80
+
81
+ device = "cuda" if torch.cuda.is_available() else "cpu"
82
+
83
+ # video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
84
+ video = ...
85
+
86
+ model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)
87
+
88
+ with torch.inference_mode():
89
+ with torch.autocast(device_type=device, dtype=torch.bfloat16):
90
+ # Returns dense offsets from frame 0 to each frame t
91
+ flows_e = model.forward_track(video, num_reg_refine=8)["flow_final"]
92
+
93
+ # Convert offsets to absolute coordinates
94
+ grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
95
+ grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
96
+ tracks = flows_e + grid_xy # [1, T, 2, H, W]
97
+ ```
98
+ ## Demo Scripts
99
+ ```bash
100
+ # Clone the repo and run demos
101
+ git clone https://github.com/cvg/megaflow.git
102
+ cd megaflow
103
+
104
+ # Optical flow on a video
105
+ python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4
106
+
107
+ # Dense point tracking
108
+ python demo_track.py --input assets/apple.mp4 --grid_size 8
109
+
110
+ # Gradio web UI
111
+ python demo_gradio.py
112
+ ```
113
+ Or try the [Colab notebook](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb) directly in the browser.
114
+
115
+ ## Citation
116
+ ```
117
+ @article{zhang2026megaflow,
118
+ title = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
119
+ author = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
120
+ journal = {arXiv preprint arXiv:2603.25739},
121
+ year = {2026}
122
+ }
123
+ ```