Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,8 @@ base_model:
|
|
16 |
|
17 |
|
18 |
# Aria Model Card
|
|
|
|
|
19 |
<!--
|
20 |
- Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
|
21 |
- Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
|
@@ -23,7 +25,7 @@ base_model:
|
|
23 |
-->
|
24 |
## Key features
|
25 |
|
26 |
-
- **SoTA Multimodal Native Performance**: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.
|
27 |
- **Lightweight and Fast**: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.
|
28 |
- **Long Multimodal Context Window**: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.
|
29 |
|
|
|
16 |
|
17 |
|
18 |
# Aria Model Card
|
19 |
+
|
20 |
+
[Dec 1, 2024] *We have released the base models (with native multimodal pre-training) for Aria ([Aria-Base-8K](https://huggingface.co/rhymes-ai/Aria-Base-8K) and [Aria-Base-64K](https://huggingface.co/rhymes-ai/Aria-Base-64K)) for research purposes and continue training.*
|
21 |
<!--
|
22 |
- Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
|
23 |
- Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
|
|
|
25 |
-->
|
26 |
## Key features
|
27 |
|
28 |
+
- **SoTA Multimodal Native Performance**: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.
|
29 |
- **Lightweight and Fast**: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.
|
30 |
- **Long Multimodal Context Window**: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.
|
31 |
|