Micron-Flow: Real-Time Optical Flow Model
Model Overview
Micron-Flow is a lightweight optical flow model optimized for real-time inference at 80+ FPS on high-end GPUs. By leveraging knowledge distillation from RAFT-Large, this model achieves high accuracy while maintaining an extremely small size of 522K parameters.
Model Details
- Architecture: Modified U-Net with MobileNetV2-based Siamese encoder, residual blocks, and a flow refinement module.
- Parameters: 522K
- Input Resolution: (152, 240)
- Training Dataset: 200K video frame pairs generated from the Moments of Time dataset using RAFT-Large.
- Distillation Approach:
- Mean squared error (MSE) loss in tanh-space
- Edge-aware smoothness loss
- Optimization: Trained with CosineAnnealing scheduler and progressive encoder unfreezing.
Performance
| Device | Inference Time | FPS |
|---|---|---|
| RTX 4090 | 0.012 sec | 83 |
| GTX 1650 | 0.013 sec | 76 |
| CPU-Only | 0.07 sec | 14 |
Key Features
- Real-time processing: 80+ FPS on RTX 4090
- Small model size: Only 2.1MB on disk
- Efficient architecture:
- Depthwise convolutions for reduced parameters
- Inverted residual blocks for better efficiency
- Flow refiner for enhanced motion consistency
- Optimized training pipeline: GPU caching and JPEG decoding acceleration
Limitations
- Trained on synthetic optical flow from RAFT-Large, which may introduce biases.
- Resolution fixed to (152, 240) – requires up/downscaling for different input sizes.
Model Usage
from torchvision.transforms.functional import to_tensor
# load the model from the .pth file
model = MicronFlow().eval()
frame1 = to_tensor(image1).unsqueeze(0)
frame2 = to_tensor(image2).unsqueeze(0)
flow = model(frame1, frame2)
License
MIT License.
Links
- Code: GitHub
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support