|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- multimodal |
|
- aria |
|
--- |
|
<p align="center"> |
|
<br>Aria</br> |
|
</p> |
|
|
|
<p align="center"> |
|
🔗 <a href="https://huggingface.co" target="_blank"> Try Aria!</a> · 📖 <a href="https://huggingface.co" target="_blank">Blog</a> · 📌 <a href="https://huggingface.co" target="_blank">Paper</a> · |
|
·🖤 <a href="https://huggingface.co" target="_blank">GitHub</a> 💜 <a href="https://huggingface.co" target="_blank">Discord</a> |
|
· 💙 <a href="https://huggingface.co" target="_blank">Twitter</a> |
|
</p> |
|
|
|
# Highlights |
|
|
|
- Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture. |
|
- Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks. |
|
- Compared to similar or even larger models, Aria boasts **faster speeds** and **lower costs**. This high efficiency stems from its ability to activate only 3.9B parameters during inference – the **fewest** among models with comparable performance. |
|
|
|
# Key features |
|
|
|
- **Robust multimodal understanding**: Aria processes various input modalities, including video, images, code, and text. It demonstrates strong performance across diverse downstream tasks such as long-context video and image understanding and OCR. Moreover, it excels in instruction following. |
|
- **Flexible image handling**: Aria supports variable image sizes and aspect ratios while maintaining high quality. |
|
- **Extended context capacity**: Aria can manage multiple images within a long context window of 64k tokens. |
|
- **Advanced text understanding**: Aria demonstrates competitive performance across language and coding tasks. |
|
|
|
# Model Info |
|
|
|
| Model | Download | Parameter | Context Length | |
|
| :---- | :------- | :------------ | :------ | |
|
| Aria | < HF link - TBD> | • Activation: 3.9B (3.5B MoE + 0.4B Visual Encoder) <br> • Total: 25.3B | 64K | |
|
|
|
# Benchmark |
|
|
|
|
|
|
|
# Quick Start |
|
|
|
|
|
|
|
|
|
# License |
|
|
|
This repo is released under the Apache 2.0 License. |