Efficient-Large-Model
/

VILA1.5-3b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

kentang1998 commited on May 3

Commit

699b413

•

1 Parent(s): 4ff7e11

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 VILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance. VILA unveils appealing capabilities, including: multi-image reasoning, in-context learning, visual chain-of-thought, and better world knowledge.
 **Model date:**
-VILA-7b was trained in Feb 2024.
 **Paper or resources for more information:**
 https://github.com/Efficient-Large-Model/VILA

 VILA is a visual language model (VLM) pretrained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge, including Jetson Orin and laptop by AWQ 4bit quantization through TinyChat framework. We find: (1) image-text pairs are not enough, interleaved image-text is essential; (2) unfreezing LLM during interleaved image-text pre-training enables in-context learning; (3)re-blending text-only instruction data is crucial to boost both VLM and text-only performance. VILA unveils appealing capabilities, including: multi-image reasoning, in-context learning, visual chain-of-thought, and better world knowledge.
 **Model date:**
+VILA1.5-3b was trained in May 2024.
 **Paper or resources for more information:**
 https://github.com/Efficient-Large-Model/VILA