license: mit
pipeline_tag: any-to-any
library_name: transformers
tags:
- human-centric
- multimodal
- 2d-vision
- 3d-vision
- skeleton-based
- vision-language
- pose-estimation
- object-detection
- image-segmentation
- action-recognition
- image-captioning
- attribute-recognition
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
This model was presented in the paper Hulk: A Universal Knowledge Translator for Human-Centric Tasks.
- Project Page: https://humancentricmodels.github.io/Hulk/
- GitHub Repository: https://github.com/OpenGVLab/Hulk
- ArXiv Paper: https://arxiv.org/abs/2312.01697
Abstract
Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did not explore 3D and vision-language tasks for human-centric and required task-specific finetuning. These limitations restrict their application to more downstream tasks and situations. To tackle these problems, we present Hulk, the first multimodal human-centric generalist model, capable of addressing 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning. The key to achieving this is condensing various task-specific heads into two general heads, one for discrete representations, \emph{e.g.,} languages, and the other for continuous representations, \emph{e.g.,} location coordinates. The outputs of two heads can be further stacked into four distinct input and output modalities. This uniform representation enables Hulk to treat diverse human-centric tasks as modality translation, integrating knowledge across a wide range of tasks. Comprehensive evaluations of Hulk on 12 benchmarks covering 8 human-centric tasks demonstrate the superiority of our proposed method, achieving state-of-the-art performance in 11 benchmarks.
Model Framework
Usage
For detailed installation instructions, dataset preparation, training procedures, evaluation scripts, and comprehensive inference examples across various human-centric tasks, please refer to the official Hulk GitHub repository.
The codebase is built on top of the 🤗 Diffusers and 🤗 Transformers libraries, and users should consult the repository for specific usage patterns.
Model Performance
Hulk has achieved state-of-the-art results on various human-centric benchmarks, demonstrating its superiority in both direct evaluation and fine-tuning scenarios. For detailed performance metrics across different tasks and datasets, please consult the tables in the GitHub README and the original paper.
Citation
If you find this work useful, please consider citing:
@article{wang2023hulk,
title={Hulk: A Universal Knowledge Translator for Human-Centric Tasks},
author={Wang, Yizhou and Wu, Yixuan and Tang, Shixiang and He, Weizhen and Guo, Xun and Zhu, Feng and Bai, Lei and Zhao, Rui and Wu, Jian and He, Tong and others},
journal={arXiv preprint arXiv:2312.01697},
year={2023}
}