File size: 852 Bytes
0a9e3b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14d15eb
 
3a35598
127f7fa
686be98
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
---
language: 
  - en
tags:
  - deepspeed
  - visualchat
  - multi-image
  - causal
  - chat
license: apache-2.0
datasets:
  - openai/clip-vit-large-patch14
---
---

# Llama-2-13b-deepspeed-visualchat

> **ATTENTION**: this encoder needs QwenCLIP model

DeepSpeed-VisualChat is a scalable, efficient, and user-friendly multi-modal training pipeline that leverages a novel multi-modal causal attention mechanism for better alignment of visual and text features. It uses data blending techniques to address the scarcity of interleaved text-and-image inputs in datasets. 


The framework trains using a 2B visual encoder from QWen-VL and a 70B language decoder from LLaMA-2, showcasing its extraordinary scalability. DeepSpeed-VisualChat is now open-sourced and encourages community contributions and collaborations. Visit the GitHub page to get started.