File size: 2,119 Bytes
34093da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- multimodal
- aria
---
<p align="center">
  <br>Aria</br>
</p> 

<p align="center">
🔗 <a href="https://huggingface.co" target="_blank"> Try Aria!</a> · 📖 <a href="https://huggingface.co" target="_blank">Blog</a> · 📌 <a href="https://huggingface.co" target="_blank">Paper</a> ·
 ·🖤 <a href="https://huggingface.co" target="_blank">GitHub</a>  💜 <a href="https://huggingface.co" target="_blank">Discord</a>
· 💙 <a href="https://huggingface.co" target="_blank">Twitter</a>
</p> 

# Highlights

- Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
- Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
- Compared to similar or even larger models, Aria boasts **faster speeds** and **lower costs**. This high efficiency stems from its ability to activate only 3.9B parameters during inference – the **fewest** among models with comparable performance.

# Key features

- **Robust multimodal understanding**: Aria processes various input modalities, including video, images, code, and text. It demonstrates strong performance across diverse downstream tasks such as long-context video and image understanding and OCR. Moreover, it excels in instruction following.
- **Flexible image handling**: Aria supports variable image sizes and aspect ratios while maintaining high quality.
- **Extended context capacity**: Aria can manage multiple images within a long context window of 64k tokens.
- **Advanced text understanding**: Aria demonstrates competitive performance across language and coding tasks.

# Model Info

| Model  | Download  | Parameter | Context Length |
| :---- | :------- | :------------ | :------ |
| Aria | < HF link - TBD> | • Activation: 3.9B (3.5B MoE + 0.4B Visual Encoder) <br> • Total: 25.3B | 64K           |

# Benchmark



# Quick Start




# License

This repo is released under the Apache 2.0 License.