arxiv:2309.16058

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Published on Sep 27, 2023

· Submitted by

akhaliq on Sep 29, 2023

#2 Paper of the day

Upvote

Authors:

Seungwhan Moon ,

Andrea Madotto ,

Zhaojiang Lin ,

Matt Smith ,

Shashank Jain ,

Chun-Fu Yeh ,

Prakash Murugesan ,

Yue Liu ,

Kavya Srinet ,

Abstract

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.

View arXiv page View PDF Add to collection

Community

Shivali

Oct 1, 2023

This comment has been hidden

Shivali

Oct 1, 2023

This comment has been hidden

gary109

Oct 3, 2023

Have github?

heresandyboy

Oct 7, 2023

I found this on github, not sure if legit and it is just a bare template at the moment.
Are you still putting this github together?
https://github.com/kyegomez/AnyMAL

d3nigma

Oct 7, 2023

•

edited Oct 7, 2023

I found this on github, not sure if legit and it is just a bare template at the moment.
Are you still putting this github together?
https://github.com/kyegomez/AnyMAL

the account has many repositories for known papers, but they do not work and may have been written with AI. Please don't use/update/star them.