File size: 6,846 Bytes
0683403
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# Deep Model Assembling

This repository contains the official code for [Deep Model Assembling](https://arxiv.org/abs/2212.04129).

<p align="center">
    <img src="imgs/teaser.png" width= "450">
</p>

> **Title**:&emsp;&emsp;[**Deep Model Assembling**](https://arxiv.org/abs/2212.04129)  
> **Authors**:&nbsp;&nbsp;[Zanlin Ni](https://scholar.google.com/citations?user=Yibz_asAAAAJ&hl=en&oi=ao), [Yulin Wang](https://scholar.google.com/citations?hl=en&user=gBP38gcAAAAJ), Jiangwei Yu, [Haojun Jiang](https://scholar.google.com/citations?hl=en&user=ULmStp8AAAAJ), [Yue Cao](https://scholar.google.com/citations?hl=en&user=iRUO1ckAAAAJ), [Gao Huang](https://scholar.google.com/citations?user=-P9LwcgAAAAJ&hl=en&oi=ao) (Corresponding Author)  
> **Institute**: Tsinghua University and Beijing Academy of Artificial Intelligence (BAAI)  
> **Publish**:&nbsp;&nbsp;&nbsp;*arXiv preprint ([arXiv 2212.04129](https://arxiv.org/abs/2212.04129))*  
> **Contact**:&nbsp;&nbsp;nzl22 at mails dot tsinghua dot edu dot cn

## News

- `Dec 10, 2022`: release code for training ViT-B, ViT-L and ViT-H on ImageNet-1K.

## Overview

In this paper, we present a divide-and-conquer strategy for training large models. Our algorithm, Model Assembling, divides a large model into smaller modules, optimizes them independently, and then assembles them together. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both training efficiency and final accuracy. For example, on ViT-H, Model Assembling outperforms E2E training by **2.7%**, while reducing the training cost by **43%**.

<p align="center">
    <img src="imgs/ours.png" width= "900">
</p>

## Data Preparation

- The ImageNet dataset should be prepared as follows:

```
data
β”œβ”€β”€ train
β”‚   β”œβ”€β”€ folder 1 (class 1)
β”‚   β”œβ”€β”€ folder 2 (class 1)
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ val
β”‚   β”œβ”€β”€ folder 1 (class 1)
β”‚   β”œβ”€β”€ folder 2 (class 1)
β”‚   β”œβ”€β”€ ...

```

## Training on ImageNet-1K

- You can add `--use_amp 1` to train in PyTorch's Automatic Mixed Precision (AMP).
- Auto-resuming is enabled by default, i.e., the training script will automatically resume from the latest ckpt in <code>output_dir</code>.
- The effective batch size = `NGPUS` * `batch_size` * `update_freq`. We keep using an effective batch size of 2048. To avoid OOM issues, you may adjust these arguments accordingly.
- We provide single-node training scripts for simplicity. For multi-node training, simply modify the training scripts accordingly with torchrun:

```bash
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py ...

# modify the above code to

torchrun \
--nnodes=$NODES \
--nproc_per_node=$NGPUS \
--rdzv_backend=c10d \
--rdzv_endpoint=$MASTER_ADDR:60900 \
main.py ...
```

<details>
<summary>Pre-training meta models (click to expand).</summary>

```bash
PHASE=PT  # Pre-training
MODEL=base  # for base
# MODEL=large  # for large
# MODEL=huge  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--divided_depths 1 1 1 1 
--output_dir ./log_dir/${PHASE}/${MODEL}

--batch_size 256
--epochs 300 
--drop-path 0 
)

python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```

</details>

<details>
<summary>Modular training (click to expand).</summary>

```bash
PHASE=MT  # Modular Training
MODEL=base DEPTH=12  # for base
# MODEL=large DEPTH=24  # for large
# MODEL=huge DEPTH=32  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--meta_model ./log_dir/PT_${MODEL}/finished_checkpoint.pth  # loading the pre-trained meta model

--batch_size 128
--update_freq 2
--epochs 100 
--drop-path 0.1
)

# Modular training each target module. Each line can be executed in parallel.
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 0 --divided_depths $((DEPTH/4)) 1 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_0
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 1 --divided_depths 1 $((DEPTH/4)) 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_1
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 2 --divided_depths 1 1 $((DEPTH/4)) 1 --output_dir ./log_dir/${PHASE}_${MODEL}_2
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 3 --divided_depths 1 1 1 $((DEPTH/4)) --output_dir ./log_dir/${PHASE}_${MODEL}_3

```

</details>

<details>
<summary>Assemble & Fine-tuning (click to expand).</summary>

```bash
PHASE=FT  # Assemble & Fine-tuning
MODEL=base DEPTH=12  # for base
# MODEL=large DEPTH=24  # for large
# MODEL=huge DEPTH=32  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--incubation_models ./log_dir/MT_${MODEL}_*/finished_checkpoint.pth  # for assembling
--divided_depths $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) \
--output_dir ./log_dir/${PHASE}_${MODEL}

--batch_size 64
--update_freq 4
--epochs 100 
--warmup-epochs 0
--clip-grad 1
--drop-path 0.1  # for base
# --drop-path 0.5  # for large
# --drop-path 0.6  # for huge
)

python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```

</details>

## Results

### Results on ImageNet-1K

<p align="center">
    <img src="./imgs/in1k.png" width= "900">
</p>

### Results on CIFAR-100

<p align="center">
    <img src="./imgs/cifar.png" width= "900">
</p>

### Training Efficiency

- Comparing different training budgets

<p align="center">
    <img src="./imgs/efficiency.png" width= "900">
</p>

- Detailed convergence curves of ViT-Huge

<p align="center">
    <img src="./imgs/huge_curve.png" width= "450">
</p>

### Data Efficiency

<p align="center">
    <img src="./imgs/data_efficiency.png" width= "450">
</p>

## Citation

If you find our work helpful, please **star🌟** this repo and **citeπŸ“‘** our paper. Thanks for your support!

```
@article{Ni2022Assemb,
  title={Deep Model Assembling},
  author={Ni, Zanlin and Wang, Yulin and Yu, Jiangwei and Jiang, Haojun and Cao, Yue and Huang, Gao},
  journal={arXiv preprint arXiv:2212.04129},
  year={2022}
}
```

## Acknowledgements

Our implementation is mainly based on [deit](https://github.com/facebookresearch/deit). We thank to their clean codebase.

## Contact

If you have any questions or concerns, please send mail to [nzl22@mails.tsinghua.edu.cn](mailto:nzl22@mails.tsinghua.edu.cn).