EgoBlind-RA โ€” CLIP Urgency Classifier

Binary urgency classifier for egocentric BLV (blind / low-vision) visual assistance queries. Predicts whether a (video, question) pair is urgent (safety-critical, demands a fast concise response) or non-urgent.

Component of the EgoBlind-RA project.

Architecture

  • Backbone: CLIP ViT-B/32 (OpenAI weights), frozen
  • Input: 4 frames uniformly sampled from a ยฑ2-second window centered at the query timestamp, plus the question text
  • Frame embeddings are mean-pooled and concatenated with the CLIP text embedding
  • Two-layer MLP head outputs a binary urgency score

Only the MLP head is trained; the CLIP backbone is frozen throughout.

Training

  • Dataset: EgoBlind, with urgency labels generated by GPT-5.2 on 5 frames per clip
  • Loss: binary cross-entropy
  • Optimizer: AdamW, lr = 1e-4
  • 5 epochs, NVIDIA L40S GPU

Performance

Metric Validation Test
Accuracy 0.863 0.798
Precision 0.930 0.879
Recall 0.807 0.695
F1 0.864 0.777
ROC-AUC 0.938 0.905

Usage

import torch
from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(
    repo_id="julia225/egoblind-ra-clip-urgency",
    filename="final_model.pt",
)
state = torch.load(ckpt, map_location="cpu")
# Load into the MLP head defined in
# https://github.com/juliavekim/EgoBlind-RA/blob/main/models/clip_urgency_classifier.ipynb

Citation

If you use this classifier, please cite the EgoBlind-RA project: @misc{kim2026egoblindra, title = {EgoBlind-RA: Towards Safer Egocentric Assistive AI for Blind Users via Risk-Adaptive Routing}, author = {Kim, Julia and Backus, Xander}, year = {2026}, url = {https://github.com/juliavekim/EgoBlind-RA}, }

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support