rhymes-ai
/

Aria

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Aria / README.md

nina-summer's picture

Create README.md

34093da verified 3 months ago

|

2.12 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- multimodal
	- aria
	---
	<p align="center">
	<br>Aria</br>
	</p>

	<p align="center">
	🔗 <a href="https://huggingface.co" target="_blank"> Try Aria!</a> · 📖 <a href="https://huggingface.co" target="_blank">Blog</a> · 📌 <a href="https://huggingface.co" target="_blank">Paper</a> ·
	·🖤 <a href="https://huggingface.co" target="_blank">GitHub</a> 💜 <a href="https://huggingface.co" target="_blank">Discord</a>
	· 💙 <a href="https://huggingface.co" target="_blank">Twitter</a>
	</p>

	# Highlights

	- Aria is the first open multimodal native MoE model, capable of seamlessly handling various input modalities within a MoE architecture.
	- Aria performs on par with GPT-4o mini and Gemini 1.5 Flash across a range of multimodal tasks while maintaining strong performance on text-only tasks.
	- Compared to similar or even larger models, Aria boasts faster speeds and lower costs. This high efficiency stems from its ability to activate only 3.9B parameters during inference – the fewest among models with comparable performance.

	# Key features

	- Robust multimodal understanding: Aria processes various input modalities, including video, images, code, and text. It demonstrates strong performance across diverse downstream tasks such as long-context video and image understanding and OCR. Moreover, it excels in instruction following.
	- Flexible image handling: Aria supports variable image sizes and aspect ratios while maintaining high quality.
	- Extended context capacity: Aria can manage multiple images within a long context window of 64k tokens.
	- Advanced text understanding: Aria demonstrates competitive performance across language and coding tasks.

	# Model Info

	\| Model \| Download \| Parameter \| Context Length \|
	\| :---- \| :------- \| :------------ \| :------ \|
	\| Aria \| < HF link - TBD> \| • Activation: 3.9B (3.5B MoE + 0.4B Visual Encoder) <br> • Total: 25.3B \| 64K \|

	# Benchmark



	# Quick Start




	# License

	This repo is released under the Apache 2.0 License.