Jiaqi-hkust (Jiaqi Tang)

We have open-sourced Hawk (NeurIPS 2024) 🎉, one of the pioneering frameworks for open-world video anomaly understanding.

In the field of video anomaly detection, despite continuous technological advancements, existing systems still face limitations in semantic understanding of scenes and user interaction, making it challenging to effectively identify complex anomalous scenes. Additionally, the scarcity of datasets restricts the applicability of these systems in open-world scenarios.

To tackle these challenges, we developed Hawk, an open-world video understanding and anomaly detection framework. Hawk significantly enhances anomaly recognition by identifying motion information differences between anomalous and normal videos. We introduce an auxiliary consistency loss to strengthen the focus on motion modalities and establish a supervisory relationship between motion and language representations. Furthermore, we have annotated over 8,000 anomalous videos and their language descriptions and created 8,000 question-answer pairs to support effective training in diverse open-world scenarios.

Experimental results demonstrate that Hawk surpasses existing video understanding frameworks in video description generation and question-answering tasks.

We warmly invite everyone to try it out!
- Hugging Face Demo: Jiaqi-hkust/hawk
- Hugging Face Model: Jiaqi-hkust/hawk
- Hugging Face Dataset: Jiaqi-hkust/hawk
- GitHub Code: https://github.com/jqtangust/hawk

We look forward to your feedback and participation! 👏

posted an update about 1 month ago

Post

1857

We have open-sourced Hawk (NeurIPS 2024) 🎉, one of the pioneering frameworks for open-world video anomaly understanding.

In the field of video anomaly detection, despite continuous technological advancements, existing systems still face limitations in semantic understanding of scenes and user interaction, making it challenging to effectively identify complex anomalous scenes. Additionally, the scarcity of datasets restricts the applicability of these systems in open-world scenarios.

To tackle these challenges, we developed Hawk, an open-world video understanding and anomaly detection framework. Hawk significantly enhances anomaly recognition by identifying motion information differences between anomalous and normal videos. We introduce an auxiliary consistency loss to strengthen the focus on motion modalities and establish a supervisory relationship between motion and language representations. Furthermore, we have annotated over 8,000 anomalous videos and their language descriptions and created 8,000 question-answer pairs to support effective training in diverse open-world scenarios.

Experimental results demonstrate that Hawk surpasses existing video understanding frameworks in video description generation and question-answering tasks.

We warmly invite everyone to try it out!
- Hugging Face Demo: Jiaqi-hkust/hawk
- Hugging Face Model: Jiaqi-hkust/hawk
- Hugging Face Dataset: Jiaqi-hkust/hawk
- GitHub Code: https://github.com/jqtangust/hawk

We look forward to your feedback and participation! 👏