File size: 6,052 Bytes
0683403
 
813b1b2
0683403
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# Deep Model Assembling

This repository contains the pre-trained models for [Deep Model Assembling](https://arxiv.org/abs/2212.04129).

> **Title**:  [**Deep Model Assembling**](https://arxiv.org/abs/2212.04129)  
> **Authors**:  [Zanlin Ni](https://scholar.google.com/citations?user=Yibz_asAAAAJ&hl=en&oi=ao), [Yulin Wang](https://scholar.google.com/citations?hl=en&user=gBP38gcAAAAJ), Jiangwei Yu, [Haojun Jiang](https://scholar.google.com/citations?hl=en&user=ULmStp8AAAAJ), [Yue Cao](https://scholar.google.com/citations?hl=en&user=iRUO1ckAAAAJ), [Gao Huang](https://scholar.google.com/citations?user=-P9LwcgAAAAJ&hl=en&oi=ao) (Corresponding Author)  
> **Institute**: Tsinghua University and Beijing Academy of Artificial Intelligence (BAAI)  
> **Publish**:   *arXiv preprint ([arXiv 2212.04129](https://arxiv.org/abs/2212.04129))*  
> **Contact**:  nzl22 at mails dot tsinghua dot edu dot cn

## Overview

In this paper, we present a divide-and-conquer strategy for training large models. Our algorithm, Model Assembling, divides a large model into smaller modules, optimizes them independently, and then assembles them together. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both training efficiency and final accuracy. For example, on ViT-H, Model Assembling outperforms E2E training by **2.7%**, while reducing the training cost by **43%**.


## Data Preparation

- The ImageNet dataset should be prepared as follows:

```
data
β”œβ”€β”€ train
β”‚   β”œβ”€β”€ folder 1 (class 1)
β”‚   β”œβ”€β”€ folder 2 (class 1)
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ val
β”‚   β”œβ”€β”€ folder 1 (class 1)
β”‚   β”œβ”€β”€ folder 2 (class 1)
β”‚   β”œβ”€β”€ ...

```

## Training on ImageNet-1K

- You can add `--use_amp 1` to train in PyTorch's Automatic Mixed Precision (AMP).
- Auto-resuming is enabled by default, i.e., the training script will automatically resume from the latest ckpt in <code>output_dir</code>.
- The effective batch size = `NGPUS` * `batch_size` * `update_freq`. We keep using an effective batch size of 2048. To avoid OOM issues, you may adjust these arguments accordingly.
- We provide single-node training scripts for simplicity. For multi-node training, simply modify the training scripts accordingly with torchrun:

```bash
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py ...

# modify the above code to

torchrun \
--nnodes=$NODES \
--nproc_per_node=$NGPUS \
--rdzv_backend=c10d \
--rdzv_endpoint=$MASTER_ADDR:60900 \
main.py ...
```

<details>
<summary>Pre-training meta models (click to expand).</summary>

```bash
PHASE=PT  # Pre-training
MODEL=base  # for base
# MODEL=large  # for large
# MODEL=huge  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--divided_depths 1 1 1 1 
--output_dir ./log_dir/${PHASE}/${MODEL}

--batch_size 256
--epochs 300 
--drop-path 0 
)

python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```

</details>

<details>
<summary>Modular training (click to expand).</summary>

```bash
PHASE=MT  # Modular Training
MODEL=base DEPTH=12  # for base
# MODEL=large DEPTH=24  # for large
# MODEL=huge DEPTH=32  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--meta_model ./log_dir/PT_${MODEL}/finished_checkpoint.pth  # loading the pre-trained meta model

--batch_size 128
--update_freq 2
--epochs 100 
--drop-path 0.1
)

# Modular training each target module. Each line can be executed in parallel.
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 0 --divided_depths $((DEPTH/4)) 1 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_0
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 1 --divided_depths 1 $((DEPTH/4)) 1 1 --output_dir ./log_dir/${PHASE}_${MODEL}_1
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 2 --divided_depths 1 1 $((DEPTH/4)) 1 --output_dir ./log_dir/${PHASE}_${MODEL}_2
python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}" --idx 3 --divided_depths 1 1 1 $((DEPTH/4)) --output_dir ./log_dir/${PHASE}_${MODEL}_3

```

</details>

<details>
<summary>Assemble & Fine-tuning (click to expand).</summary>

```bash
PHASE=FT  # Assemble & Fine-tuning
MODEL=base DEPTH=12  # for base
# MODEL=large DEPTH=24  # for large
# MODEL=huge DEPTH=32  # for huge
NGPUS=8

args=(
--phase ${PHASE} 
--model vit_${MODEL}_patch16_224   # for base, large
# --model vit_${MODEL}_patch14_224   # for huge
--incubation_models ./log_dir/MT_${MODEL}_*/finished_checkpoint.pth  # for assembling
--divided_depths $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) $((DEPTH/4)) \
--output_dir ./log_dir/${PHASE}_${MODEL}

--batch_size 64
--update_freq 4
--epochs 100 
--warmup-epochs 0
--clip-grad 1
--drop-path 0.1  # for base
# --drop-path 0.5  # for large
# --drop-path 0.6  # for huge
)

python -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port=23346 --use_env main.py "${args[@]}"
```

</details>

## Citation

If you find our work helpful, please **star🌟** this repo and **citeπŸ“‘** our paper. Thanks for your support!

```
@article{Ni2022Assemb,
  title={Deep Model Assembling},
  author={Ni, Zanlin and Wang, Yulin and Yu, Jiangwei and Jiang, Haojun and Cao, Yue and Huang, Gao},
  journal={arXiv preprint arXiv:2212.04129},
  year={2022}
}
```

## Acknowledgements

Our implementation is mainly based on [deit](https://github.com/facebookresearch/deit). We thank to their clean codebase.

## Contact

If you have any questions or concerns, please send mail to [nzl22@mails.tsinghua.edu.cn](mailto:nzl22@mails.tsinghua.edu.cn).