teowu commited on
Commit
4844f0b
1 Parent(s): 10c6b53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -16,6 +16,8 @@ base_model:
16
 
17
 
18
  # Aria Model Card
 
 
19
  <!--
20
  - Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
21
  - Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
@@ -23,7 +25,7 @@ base_model:
23
  -->
24
  ## Key features
25
 
26
- - **SoTA Multimodal Native Performance**: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.
27
  - **Lightweight and Fast**: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.
28
  - **Long Multimodal Context Window**: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.
29
 
 
16
 
17
 
18
  # Aria Model Card
19
+
20
+ [Dec 1, 2024] *We have released the base models (with native multimodal pre-training) for Aria ([Aria-Base-8K](https://huggingface.co/rhymes-ai/Aria-Base-8K) and [Aria-Base-64K](https://huggingface.co/rhymes-ai/Aria-Base-64K)) for research purposes and continue training.*
21
  <!--
22
  - Aria is the **first open multimodal native MoE** model, capable of seamlessly handling various input modalities within a MoE architecture.
23
  - Aria performs **on par with GPT-4o mini and Gemini 1.5 Flash** across a range of multimodal tasks while maintaining strong performance on **text**-only tasks.
 
25
  -->
26
  ## Key features
27
 
28
+ - **SoTA Multimodal Native Performance**: Aria achieves strong performance on a wide range of multimodal, language, and coding tasks. It is superior in video and document understanding.
29
  - **Lightweight and Fast**: Aria is a mixture-of-expert model with 3.9B activated parameters per token. It efficently encodes visual input of variable sizes and aspect ratios.
30
  - **Long Multimodal Context Window**: Aria supports multimodal input of up to 64K tokens. It can caption a 256-frame video in 10 seconds.
31