File size: 6,846 Bytes
0683403 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# Deep Model Assembling
This repository contains the official code for [Deep Model Assembling](https://arxiv.org/abs/2212.04129).
<p align="center">
<img src="imgs/teaser.png" width= "450">
</p>
> **Title**:  [**Deep Model Assembling**](https://arxiv.org/abs/2212.04129)
> **Authors**: [Zanlin Ni](https://scholar.google.com/citations?user=Yibz_asAAAAJ&hl=en&oi=ao), [Yulin Wang](https://scholar.google.com/citations?hl=en&user=gBP38gcAAAAJ), Jiangwei Yu, [Haojun Jiang](https://scholar.google.com/citations?hl=en&user=ULmStp8AAAAJ), [Yue Cao](https://scholar.google.com/citations?hl=en&user=iRUO1ckAAAAJ), [Gao Huang](https://scholar.google.com/citations?user=-P9LwcgAAAAJ&hl=en&oi=ao) (Corresponding Author)
> **Institute**: Tsinghua University and Beijing Academy of Artificial Intelligence (BAAI)
> **Publish**: *arXiv preprint ([arXiv 2212.04129](https://arxiv.org/abs/2212.04129))*
> **Contact**: nzl22 at mails dot tsinghua dot edu dot cn
## News
- `Dec 10, 2022`: release code for training ViT-B, ViT-L and ViT-H on ImageNet-1K.
## Overview
In this paper, we present a divide-and-conquer strategy for training large models. Our algorithm, Model Assembling, divides a large model into smaller modules, optimizes them independently, and then assembles them together. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both training efficiency and final accuracy. For example, on ViT-H, Model Assembling outperforms E2E training by **2.7%**, while reducing the training cost by **43%**.
<p align="center">
<img src="imgs/ours.png" width= "900">
</p>
## Data Preparation
- The ImageNet dataset should be prepared as follows:
```
data
βββ train
β βββ folder 1 (class 1)
β βββ folder 2 (class 1)
β βββ ...
βββ val
β βββ folder 1 (class 1)
β βββ folder 2 (class 1)
β βββ ...
```
## Training on ImageNet-1K
- You can add `--use_amp 1` to train in PyTorch's Automatic Mixed Precision (AMP).
- Auto-resuming is enabled by default, i.e., the training script will automatically resume from the latest ckpt in <code>output_dir</code>.
- The effective batch size = `NGPUS` * `batch_size` * `update_freq`. We keep using an effective batch size of 2048. To avoid OOM issues, you may adjust these arguments accordingly.
- We provide single-node training scripts for simplicity. For multi-node training, simply modify the training scripts accordingly with torchrun:
```bash
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py ...
# modify the above code to
torchrun \
--nnodes=$NODES \
--nproc_per_node=$NGPUS \
--rdzv_backend=c10d \
--rdzv_endpoint=$MASTER_ADDR:60900 \
main.py ...
```
<details>
<summary>Pre-training meta models (click to expand).</summary>
```bash
PHASE=PT # Pre-training
MODEL=base # for base
# MODEL=large # for large
# MODEL=huge # for huge
NGPUS=8
args=(
--phase ${PHASE}
--model vit_${MODEL}_patch16_224 # for base, large
# --model vit_${MODEL}_patch14_224 # for huge
--divided_depths 1 1 1 1
--output_dir ./log_dir/${PHASE}/${MODEL}
--batch_size 256
--epochs 300
--drop-path 0
)
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```
</details>
<details>
<summary>Modular training (click to expand).</summary>
```bash
PHASE=MT # Modular Training
MODEL=base DEPTH=12 # for base
# MODEL=large DEPTH=24 # for large
# MODEL=huge DEPTH=32 # for huge
NGPUS=8
args=(
--phase ${PHASE}
--model vit_${MODEL}_patch16_224 # for base, large
# --model vit_${MODEL}_patch14_224 # for huge
--meta_model ./log_dir/PT_${MODEL}/finished_checkpoint.pth # loading the pre-trained meta model
--batch_size 128
--update_freq 2
--epochs 100
--drop-path 0.1
)
# Modular training each target module. Each line can be executed in parallel.
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 0 --divided_depths $((DEPTH/4)) 1 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_0
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 1 --divided_depths 1 $((DEPTH/4)) 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_1
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 2 --divided_depths 1 1 $((DEPTH/4)) 1 --output_dir ./log_dir/${PHASE}_${MODEL}_2
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 3 --divided_depths 1 1 1 $((DEPTH/4)) --output_dir ./log_dir/${PHASE}_${MODEL}_3
```
</details>
<details>
<summary>Assemble & Fine-tuning (click to expand).</summary>
```bash
PHASE=FT # Assemble & Fine-tuning
MODEL=base DEPTH=12 # for base
# MODEL=large DEPTH=24 # for large
# MODEL=huge DEPTH=32 # for huge
NGPUS=8
args=(
--phase ${PHASE}
--model vit_${MODEL}_patch16_224 # for base, large
# --model vit_${MODEL}_patch14_224 # for huge
--incubation_models ./log_dir/MT_${MODEL}_*/finished_checkpoint.pth # for assembling
--divided_depths $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) \
--output_dir ./log_dir/${PHASE}_${MODEL}
--batch_size 64
--update_freq 4
--epochs 100
--warmup-epochs 0
--clip-grad 1
--drop-path 0.1 # for base
# --drop-path 0.5 # for large
# --drop-path 0.6 # for huge
)
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```
</details>
## Results
### Results on ImageNet-1K
<p align="center">
<img src="./imgs/in1k.png" width= "900">
</p>
### Results on CIFAR-100
<p align="center">
<img src="./imgs/cifar.png" width= "900">
</p>
### Training Efficiency
- Comparing different training budgets
<p align="center">
<img src="./imgs/efficiency.png" width= "900">
</p>
- Detailed convergence curves of ViT-Huge
<p align="center">
<img src="./imgs/huge_curve.png" width= "450">
</p>
### Data Efficiency
<p align="center">
<img src="./imgs/data_efficiency.png" width= "450">
</p>
## Citation
If you find our work helpful, please **starπ** this repo and **citeπ** our paper. Thanks for your support!
```
@article{Ni2022Assemb,
title={Deep Model Assembling},
author={Ni, Zanlin and Wang, Yulin and Yu, Jiangwei and Jiang, Haojun and Cao, Yue and Huang, Gao},
journal={arXiv preprint arXiv:2212.04129},
year={2022}
}
```
## Acknowledgements
Our implementation is mainly based on [deit](https://github.com/facebookresearch/deit). We thank to their clean codebase.
## Contact
If you have any questions or concerns, please send mail to [nzl22@mails.tsinghua.edu.cn](mailto:nzl22@mails.tsinghua.edu.cn).
|