File size: 856 Bytes
0a9e3b3 14d15eb 3a35598 127f7fa 686be98 fed0503 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---
language:
- en
tags:
- deepspeed
- visualchat
- multi-image
- causal
- chat
license: apache-2.0
datasets:
- openai/clip-vit-large-patch14
---
---
# Llama-2-13b-deepspeed-visualchat
> **ATTENTION**: this encoder needs QwenCLIP model
DeepSpeed-VisualChat is a scalable, efficient, and user-friendly multi-modal training pipeline that leverages a novel multi-modal causal attention mechanism for better alignment of visual and text features. It uses data blending techniques to address the scarcity of interleaved text-and-image inputs in datasets.
The framework trains using a 2B visual encoder from QWen-VL and a 13B-70B language decoder from LLaMA-2, showcasing its extraordinary scalability. DeepSpeed-VisualChat is now open-sourced and encourages community contributions and collaborations. Visit the GitHub page to get started.
|