Aria / README.md
nina-summer's picture
Create README.md
34093da verified
|
raw
history blame
2.12 kB
metadata
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - multimodal
  - aria


Aria

🔗 Try Aria! · 📖 Blog · 📌 Paper · ·🖤 GitHub 💜 Discord · 💙 Twitter

Highlights

  • Aria is the first open multimodal native MoE model, capable of seamlessly handling various input modalities within a MoE architecture.
  • Aria performs on par with GPT-4o mini and Gemini 1.5 Flash across a range of multimodal tasks while maintaining strong performance on text-only tasks.
  • Compared to similar or even larger models, Aria boasts faster speeds and lower costs. This high efficiency stems from its ability to activate only 3.9B parameters during inference – the fewest among models with comparable performance.

Key features

  • Robust multimodal understanding: Aria processes various input modalities, including video, images, code, and text. It demonstrates strong performance across diverse downstream tasks such as long-context video and image understanding and OCR. Moreover, it excels in instruction following.
  • Flexible image handling: Aria supports variable image sizes and aspect ratios while maintaining high quality.
  • Extended context capacity: Aria can manage multiple images within a long context window of 64k tokens.
  • Advanced text understanding: Aria demonstrates competitive performance across language and coding tasks.

Model Info

Model Download Parameter Context Length
Aria < HF link - TBD> • Activation: 3.9B (3.5B MoE + 0.4B Visual Encoder)
• Total: 25.3B
64K

Benchmark

Quick Start

License

This repo is released under the Apache 2.0 License.