FreedomIntelligence
/

ALLaVA-StableLM2-1_6B

Text Generation

llava_stablelm_1_6b

Model card Files Files and versions Community

g-h-chen commited on Jun 25

Commit

f7c7794

•

1 Parent(s): 73293b8

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -89,11 +89,11 @@ See [here](https://github.com/FreedomIntelligence/ALLaVA/tree/main?tab=readme-ov
 ## 🏋️‍♂️ Training
 ### Data
-<!-- <div align=center>
 <img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
-</div> -->
-ALLaVA uses 795K and 1.4M data for PT. and FT., respectively.
 ### Code
@@ -110,7 +110,7 @@ These two models share the same PT procedure. -->
 ### Hyperparameters
 | Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
-| ---: | ---: |--:| ---: | ---: | ---: | ---: | ---: |
 | 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts |  0 |
 The LM backbone, projector are trainable, while the vision encoder is kept frozen.

 ## 🏋️‍♂️ Training
 ### Data
+<div align=center>
 <img src="training_datasets_by_stage.jpg" width = "640" alt="training_datasets" align=center />
+</div>
+ALLaVA uses 1.0M and 1.5M data for PT. and FT., respectively.
 ### Code
 ### Hyperparameters
 | Global Batch Size| ZeRO Stage| Optimizer | Max LR| Min LR | Scheduler | Weight decay |
+| ---: | ---: |--:| ---: | ---: | ---: | ---: |
 | 256 (PT) / 128 (FT) | 1| AdamW | 2e-5 | 2e-6 | CosineAnnealingWarmRestarts |  0 |
 The LM backbone, projector are trainable, while the vision encoder is kept frozen.