TimesNet / README.md
sumit7488's picture
Create README.md
1293b6c verified
metadata
license: mit
tags:
  - video-classification
  - timesformer
  - retnet
  - action-recognition
  - ucf101
  - hmdb51
  - transformers
  - efficient-models
datasets:
  - ucf101
  - hmdb51

🎬 TimeSformer + RetNet Hybrid for Efficient Video Action Recognition

This project presents a hybrid architecture that replaces the temporal attention mechanism in TimeSformer with RetNet, achieving:

  • ⚑ Faster training
  • 🧠 Lower memory usage
  • 🎯 Comparable or improved accuracy

πŸš€ Model Variants

We trained and evaluated 4 configurations:

Model Dataset
TimeSformer (Baseline) UCF101
TimeSformer (Baseline) HMDB51
TimeSformer + RetNet (Hybrid) UCF101
TimeSformer + RetNet (Hybrid) HMDB51

🧠 Proposed Architecture

πŸ”Ή Baseline

  • TimeSformer
  • Full spatio-temporal attention

πŸ”Ή Hybrid Model (Proposed)

  • Spatial Attention β†’ TimeSformer
  • Temporal Modeling β†’ RetNet

πŸ‘‰ RetNet replaces temporal self-attention to reduce complexity from:

  • Quadratic β†’ Linear time

πŸ“Š Hybrid Model Training Results (UCF101)

Epoch Train Loss Train Acc Val Loss Val Acc F1
1 4.5275 0.0458 4.1596 0.3542 0.3076
2 3.6647 0.4089 2.6496 0.7550 0.7214
3 2.4221 0.6995 1.5313 0.8623 0.8509
4 1.8874 0.7841 1.2290 0.8961 0.8918
5 1.7268 0.8104 1.1584 0.9075 0.9040
6 1.6615 0.8145 1.1088 0.9167 0.9142
7 1.6076 0.8191 1.0962 0.9202 0.9168
8 1.5100 0.8234 1.0865 0.9260 0.9233
9 1.4704 0.8232 1.0812 0.9260 0.9226

πŸ† Best Performance (Hybrid Model)

  • Validation Accuracy: 92.60%
  • F1 Score: 0.9233
  • Achieved at Epoch 8

⚑ Efficiency Comparison

Metric TimeSformer Hybrid (RetNet)
Peak GPU Memory ~9.3–9.8 GB ~7.2 GB βœ…
Training Speed Slower Faster βœ…
Temporal Complexity O(nΒ²) O(n) βœ…

πŸ‘‰ ~25% memory reduction with comparable performance.


πŸ” Training Strategy

Due to Kaggle’s 12-hour runtime limit, training was performed in stages:

  • Initial training
  • Save best checkpoint
  • Resume from .safetensors
  • Continue training

βš™οΈ Training Details

  • Mixed Precision Training (torch.cuda.amp)
  • Checkpoint-based training
  • Per-class evaluation reports
  • GPU: Kaggle environment

πŸ“¦ Base Model

  • facebook/timesformer-base-finetuned-k400

πŸš€ Usage

pip install torch torchvision transformers