Update README.md
Browse files
README.md
CHANGED
@@ -89,11 +89,11 @@ See [here](https://github.com/FreedomIntelligence/ALLaVA/tree/main?tab=readme-ov
|
|
89 |
## 🏋️♂️ Training
|
90 |
|
91 |
### Data
|
92 |
-
|
93 |
<img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
|
94 |
-
</div>
|
95 |
|
96 |
-
ALLaVA uses
|
97 |
|
98 |
|
99 |
### Code
|
@@ -110,7 +110,7 @@ These two models share the same PT procedure. -->
|
|
110 |
### Hyperparameters
|
111 |
|
112 |
| Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
|
113 |
-
| ---: | ---: |--:| ---: | ---: | ---: | ---: |
|
114 |
| 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts | 0 |
|
115 |
|
116 |
The LM backbone, projector are trainable, while the vision encoder is kept frozen.
|
|
|
89 |
## 🏋️♂️ Training
|
90 |
|
91 |
### Data
|
92 |
+
<div align=center>
|
93 |
<img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
|
94 |
+
</div>
|
95 |
|
96 |
+
ALLaVA uses 1.0M and 1.5M data for PT. and FT., respectively.
|
97 |
|
98 |
|
99 |
### Code
|
|
|
110 |
### Hyperparameters
|
111 |
|
112 |
| Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
|
113 |
+
| ---: | ---: |--:| ---: | ---: | ---: | ---: |
|
114 |
| 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts | 0 |
|
115 |
|
116 |
The LM backbone, projector are trainable, while the vision encoder is kept frozen.
|