Micron-Flow: Real-Time Optical Flow Model

Model Overview

Micron-Flow is a lightweight optical flow model optimized for real-time inference at 80+ FPS on high-end GPUs. By leveraging knowledge distillation from RAFT-Large, this model achieves high accuracy while maintaining an extremely small size of 522K parameters.

Model Details

  • Architecture: Modified U-Net with MobileNetV2-based Siamese encoder, residual blocks, and a flow refinement module.
  • Parameters: 522K
  • Input Resolution: (152, 240)
  • Training Dataset: 200K video frame pairs generated from the Moments of Time dataset using RAFT-Large.
  • Distillation Approach:
    • Mean squared error (MSE) loss in tanh-space
    • Edge-aware smoothness loss
  • Optimization: Trained with CosineAnnealing scheduler and progressive encoder unfreezing.

Performance

Device Inference Time FPS
RTX 4090 0.012 sec 83
GTX 1650 0.013 sec 76
CPU-Only 0.07 sec 14

Key Features

  • Real-time processing: 80+ FPS on RTX 4090
  • Small model size: Only 2.1MB on disk
  • Efficient architecture:
    • Depthwise convolutions for reduced parameters
    • Inverted residual blocks for better efficiency
    • Flow refiner for enhanced motion consistency
  • Optimized training pipeline: GPU caching and JPEG decoding acceleration

Limitations

  • Trained on synthetic optical flow from RAFT-Large, which may introduce biases.
  • Resolution fixed to (152, 240) – requires up/downscaling for different input sizes.

Model Usage

from torchvision.transforms.functional import to_tensor

# load the model from the .pth file

model = MicronFlow().eval()
frame1 = to_tensor(image1).unsqueeze(0)
frame2 = to_tensor(image2).unsqueeze(0)
flow = model(frame1, frame2)

License

MIT License.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support