Model Summery

We propose Lenna a Language enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection.

Model Sources

Repository: https://github.com/Meituan-AutoML/Lenna
Paper: https://arxiv.org/abs/2312.02433

How to Get Started with the Model

Model weights can be loaded with Hugging Face Transformers. Examples can be found at Github.

mtgv
/

Lenna-7B

Model Summery

Model Sources

How to Get Started with the Model

Datasets used to train mtgv/Lenna-7B