Gofinge commited on
Commit
5e25cfa
1 Parent(s): e1afc8b

Release experiment records

Browse files
Files changed (36) hide show
  1. .gitattributes +7 -0
  2. README.md +22 -0
  3. nuscenes-semseg-pt-v3m1-0-base/config.py +230 -0
  4. nuscenes-semseg-pt-v3m1-0-base/events.out.tfevents.1704002329.nuscenes-semseg-pt-v3m1-0-base +3 -0
  5. nuscenes-semseg-pt-v3m1-0-base/model/model_best.pth +3 -0
  6. nuscenes-semseg-pt-v3m1-0-base/model/model_last.pth +3 -0
  7. nuscenes-semseg-pt-v3m1-0-base/train.log +3 -0
  8. s3dis-semseg-pt-v3m1-0-rpe/config.py +244 -0
  9. s3dis-semseg-pt-v3m1-0-rpe/events.out.tfevents.1703439768.s3dis-semseg-pt-v3m1-0-rpe +3 -0
  10. s3dis-semseg-pt-v3m1-0-rpe/model/model_best.pth +3 -0
  11. s3dis-semseg-pt-v3m1-0-rpe/model/model_last.pth +3 -0
  12. s3dis-semseg-pt-v3m1-0-rpe/train.log +0 -0
  13. s3dis-semseg-pt-v3m1-1-ppt-extreme/config.py +432 -0
  14. s3dis-semseg-pt-v3m1-1-ppt-extreme/events.out.tfevents.1708160591.s3dis-semseg-pt-v3m1-1-ppt-extreme +3 -0
  15. s3dis-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth +3 -0
  16. s3dis-semseg-pt-v3m1-1-ppt-extreme/model/model_last.pth +3 -0
  17. s3dis-semseg-pt-v3m1-1-ppt-extreme/train.log +3 -0
  18. scannet-semseg-pt-v3m1-0-base/config.py +301 -0
  19. scannet-semseg-pt-v3m1-0-base/events.out.tfevents.1703049730.scannet-semseg-pt-v3m1-0-base +3 -0
  20. scannet-semseg-pt-v3m1-0-base/model/model_best.pth +3 -0
  21. scannet-semseg-pt-v3m1-0-base/model/model_last.pth +3 -0
  22. scannet-semseg-pt-v3m1-0-base/train.log +3 -0
  23. scannet-semseg-pt-v3m1-1-ppt-extreme/config.py +381 -0
  24. scannet-semseg-pt-v3m1-1-ppt-extreme/events.out.tfevents.1706979139.scannet-semseg-pt-v3m1-1-ppt-extreme +3 -0
  25. scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth +3 -0
  26. scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_last.pth +3 -0
  27. scannet-semseg-pt-v3m1-1-ppt-extreme/test.log +0 -0
  28. scannet-semseg-pt-v3m1-1-ppt-extreme/train.log +3 -0
  29. scannet200-semseg-pt-v3m1-0-base/config.py +375 -0
  30. scannet200-semseg-pt-v3m1-0-base/events.out.tfevents.1703049688.scannet200-semseg-pt-v3m1-0-base +3 -0
  31. scannet200-semseg-pt-v3m1-0-base/model/model_best.pth +3 -0
  32. scannet200-semseg-pt-v3m1-0-base/model/model_last.pth +3 -0
  33. scannet200-semseg-pt-v3m1-0-base/train.log +3 -0
  34. waymo-semseg-pt-v3m1-0-base/config.py +217 -0
  35. waymo-semseg-pt-v3m1-0-base/events.out.tfevents.1708353865.waymo-semseg-pt-v3m1-0-base +3 -0
  36. waymo-semseg-pt-v3m1-0-base/train.log +3 -0
.gitattributes CHANGED
@@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ nuscenes-semseg-pt-v3m1-0-base/train.log filter=lfs diff=lfs merge=lfs -text
37
+ s3dis-semseg-pt-v3m1-0-rpe/train.log filter=lfs diff=lfs merge=lfs -text
38
+ scannet-semseg-pt-v3m1-0-base/train.log filter=lfs diff=lfs merge=lfs -text
39
+ scannet200-semseg-pt-v3m1-0-base/train.log filter=lfs diff=lfs merge=lfs -text
40
+ s3dis-semseg-pt-v3m1-1-ppt-extreme/train.log filter=lfs diff=lfs merge=lfs -text
41
+ scannet-semseg-pt-v3m1-1-ppt-extreme/train.log filter=lfs diff=lfs merge=lfs -text
42
+ waymo-semseg-pt-v3m1-0-base/train.log filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,25 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ ## Model Zoo
6
+ ### 1. Indoor semantic segmentation
7
+ | Model | Benchmark | Additional Data | Num GPUs | Val mIoU | Config | Tensorboard | Exp Record |
8
+ | :---: | :---: |:---------------:| :---: | :---: | :---: | :---: | :---: |
9
+ | PTv3 | ScanNet | ✗ | 4 | 77.6% | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/scannet/semseg-pt-v3m1-0-base.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/scannet-semseg-pt-v3m1-0-base) |
10
+ | PTv3 + PPT | ScanNet | ✓ | 8 | 78.5% | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/scannet/semseg-pt-v3m1-1-ppt-extreme.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/scannet-semseg-pt-v3m1-1-ppt-extreme) |
11
+ | PTv3 | ScanNet200 | ✗ | 4 | 35.3% | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/scannet200/semseg-pt-v3m1-0-base.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) |[link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/scannet200-semseg-pt-v3m1-0-base)|
12
+ | PTv3 + PPT | ScanNet200 | ✓ (f.t.) | 4 | | | | |
13
+ | PTv3 | S3DIS (Area5) | ✗ | 4 | 73.6% | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/s3dis/semseg-pt-v3m1-0-rpe.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/s3dis-semseg-pt-v3m1-0-rpe) |
14
+ | PTv3 + PPT | S3DIS (Area5) | ✓ | 8 | 75.4% | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/s3dis/semseg-pt-v3m1-1-ppt-extreme.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/s3dis-semseg-pt-v3m1-1-ppt-extreme) |
15
+
16
+ ### 2. Outdoor semantic segmentation
17
+ | Model | Benchmark | Additional Data | Num GPUs | Val mIoU | Config | Tensorboard | Exp Record |
18
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
19
+ | PTv3 | nuScenes | ✗ | 4 | 80.3 | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/nuscenes/semseg-pt-v3m1-0-base.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard)|[link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/nuscenes-semseg-pt-v3m1-0-base) |
20
+ | PTv3 + PPT | nuScenes | ✓ | 8 | | | | |
21
+ | PTv3 | SemanticKITTI | ✗ | 4 | | | | |
22
+ | PTv3 + PPT | SemanticKITTI | ✓ | 8 | | | | |
23
+ | PTv3 | Waymo | ✗ | 4 | 71.2 | [link](https://github.com/Pointcept/Pointcept/blob/main/configs/waymo/semseg-pt-v3m1-0-base.py) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tensorboard) | [link](https://huggingface.co/Pointcept/PointTransformerV3/tree/main/waymo-semseg-pt-v3m1-0-base) (log only) |
24
+ | PTv3 + PPT | Waymo | ✓ | 8 | | | | |
25
+ * Model weights trained with Waymo Open Dataset cannot be released due to the regulations.
nuscenes-semseg-pt-v3m1-0-base/config.py ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 28024989
6
+ save_path = 'exp/nuscenes/semseg-pt-v3m1-0-base'
7
+ num_worker = 16
8
+ batch_size = 12
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 50
12
+ eval_epoch = 50
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = False
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0002)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='DefaultTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='DefaultSegmentorV2',
31
+ num_classes=16,
32
+ backbone_out_channels=64,
33
+ backbone=dict(
34
+ type='PT-v3m1',
35
+ in_channels=4,
36
+ order=['z', 'z-trans', 'hilbert', 'hilbert-trans'],
37
+ stride=(2, 2, 2, 2),
38
+ enc_depths=(2, 2, 2, 6, 2),
39
+ enc_channels=(32, 64, 128, 256, 512),
40
+ enc_num_head=(2, 4, 8, 16, 32),
41
+ enc_patch_size=(1024, 1024, 1024, 1024, 1024),
42
+ dec_depths=(2, 2, 2, 2),
43
+ dec_channels=(64, 64, 128, 256),
44
+ dec_num_head=(4, 4, 8, 16),
45
+ dec_patch_size=(1024, 1024, 1024, 1024),
46
+ mlp_ratio=4,
47
+ qkv_bias=True,
48
+ qk_scale=None,
49
+ attn_drop=0.0,
50
+ proj_drop=0.0,
51
+ drop_path=0.3,
52
+ shuffle_orders=True,
53
+ pre_norm=True,
54
+ enable_rpe=False,
55
+ enable_flash=True,
56
+ upcast_attention=False,
57
+ upcast_softmax=False,
58
+ cls_mode=False,
59
+ pdnorm_bn=False,
60
+ pdnorm_ln=False,
61
+ pdnorm_decouple=True,
62
+ pdnorm_adaptive=False,
63
+ pdnorm_affine=True,
64
+ pdnorm_conditions=('nuScenes', 'SemanticKITTI', 'Waymo')),
65
+ criteria=[
66
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
67
+ dict(
68
+ type='LovaszLoss',
69
+ mode='multiclass',
70
+ loss_weight=1.0,
71
+ ignore_index=-1)
72
+ ])
73
+ optimizer = dict(type='AdamW', lr=0.002, weight_decay=0.005)
74
+ scheduler = dict(
75
+ type='OneCycleLR',
76
+ max_lr=[0.002, 0.0002],
77
+ pct_start=0.04,
78
+ anneal_strategy='cos',
79
+ div_factor=10.0,
80
+ final_div_factor=100.0)
81
+ dataset_type = 'NuScenesDataset'
82
+ data_root = 'data/nuscenes'
83
+ ignore_index = -1
84
+ names = [
85
+ 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle', 'motorcycle',
86
+ 'pedestrian', 'traffic_cone', 'trailer', 'truck', 'driveable_surface',
87
+ 'other_flat', 'sidewalk', 'terrain', 'manmade', 'vegetation'
88
+ ]
89
+ data = dict(
90
+ num_classes=16,
91
+ ignore_index=-1,
92
+ names=[
93
+ 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
94
+ 'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
95
+ 'driveable_surface', 'other_flat', 'sidewalk', 'terrain', 'manmade',
96
+ 'vegetation'
97
+ ],
98
+ train=dict(
99
+ type='NuScenesDataset',
100
+ split='train',
101
+ data_root='data/nuscenes',
102
+ transform=[
103
+ dict(
104
+ type='RandomRotate',
105
+ angle=[-1, 1],
106
+ axis='z',
107
+ center=[0, 0, 0],
108
+ p=0.5),
109
+ dict(type='RandomScale', scale=[0.9, 1.1]),
110
+ dict(type='RandomFlip', p=0.5),
111
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
112
+ dict(
113
+ type='GridSample',
114
+ grid_size=0.05,
115
+ hash_type='fnv',
116
+ mode='train',
117
+ keys=('coord', 'strength', 'segment'),
118
+ return_grid_coord=True),
119
+ dict(type='ToTensor'),
120
+ dict(
121
+ type='Collect',
122
+ keys=('coord', 'grid_coord', 'segment'),
123
+ feat_keys=('coord', 'strength'))
124
+ ],
125
+ test_mode=False,
126
+ ignore_index=-1,
127
+ loop=1),
128
+ val=dict(
129
+ type='NuScenesDataset',
130
+ split='val',
131
+ data_root='data/nuscenes',
132
+ transform=[
133
+ dict(
134
+ type='GridSample',
135
+ grid_size=0.05,
136
+ hash_type='fnv',
137
+ mode='train',
138
+ keys=('coord', 'strength', 'segment'),
139
+ return_grid_coord=True),
140
+ dict(type='ToTensor'),
141
+ dict(
142
+ type='Collect',
143
+ keys=('coord', 'grid_coord', 'segment'),
144
+ feat_keys=('coord', 'strength'))
145
+ ],
146
+ test_mode=False,
147
+ ignore_index=-1),
148
+ test=dict(
149
+ type='NuScenesDataset',
150
+ split='val',
151
+ data_root='data/nuscenes',
152
+ transform=[
153
+ dict(type='Copy', keys_dict=dict(segment='origin_segment')),
154
+ dict(
155
+ type='GridSample',
156
+ grid_size=0.025,
157
+ hash_type='fnv',
158
+ mode='train',
159
+ keys=('coord', 'strength', 'segment'),
160
+ return_inverse=True)
161
+ ],
162
+ test_mode=True,
163
+ test_cfg=dict(
164
+ voxelize=dict(
165
+ type='GridSample',
166
+ grid_size=0.05,
167
+ hash_type='fnv',
168
+ mode='test',
169
+ return_grid_coord=True,
170
+ keys=('coord', 'strength')),
171
+ crop=None,
172
+ post_transform=[
173
+ dict(type='ToTensor'),
174
+ dict(
175
+ type='Collect',
176
+ keys=('coord', 'grid_coord', 'index'),
177
+ feat_keys=('coord', 'strength'))
178
+ ],
179
+ aug_transform=[[{
180
+ 'type': 'RandomScale',
181
+ 'scale': [0.9, 0.9]
182
+ }], [{
183
+ 'type': 'RandomScale',
184
+ 'scale': [0.95, 0.95]
185
+ }], [{
186
+ 'type': 'RandomScale',
187
+ 'scale': [1, 1]
188
+ }], [{
189
+ 'type': 'RandomScale',
190
+ 'scale': [1.05, 1.05]
191
+ }], [{
192
+ 'type': 'RandomScale',
193
+ 'scale': [1.1, 1.1]
194
+ }],
195
+ [{
196
+ 'type': 'RandomScale',
197
+ 'scale': [0.9, 0.9]
198
+ }, {
199
+ 'type': 'RandomFlip',
200
+ 'p': 1
201
+ }],
202
+ [{
203
+ 'type': 'RandomScale',
204
+ 'scale': [0.95, 0.95]
205
+ }, {
206
+ 'type': 'RandomFlip',
207
+ 'p': 1
208
+ }],
209
+ [{
210
+ 'type': 'RandomScale',
211
+ 'scale': [1, 1]
212
+ }, {
213
+ 'type': 'RandomFlip',
214
+ 'p': 1
215
+ }],
216
+ [{
217
+ 'type': 'RandomScale',
218
+ 'scale': [1.05, 1.05]
219
+ }, {
220
+ 'type': 'RandomFlip',
221
+ 'p': 1
222
+ }],
223
+ [{
224
+ 'type': 'RandomScale',
225
+ 'scale': [1.1, 1.1]
226
+ }, {
227
+ 'type': 'RandomFlip',
228
+ 'p': 1
229
+ }]]),
230
+ ignore_index=-1))
nuscenes-semseg-pt-v3m1-0-base/events.out.tfevents.1704002329.nuscenes-semseg-pt-v3m1-0-base ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b38774a53b3d6f928435752c1c73a17c49dd255e0df429e7bf3dc3482eb874f3
3
+ size 11464320
nuscenes-semseg-pt-v3m1-0-base/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2774d2fa1dd33e640a514afe0bd1e5af94eb08be19d08d1f9402f20dfd6db94
3
+ size 554519016
nuscenes-semseg-pt-v3m1-0-base/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc0f0baac5b2eb4d85dbf935d25f2d7728b79644eb300c49cd5daa63b9add694
3
+ size 554519016
nuscenes-semseg-pt-v3m1-0-base/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48faa94d19ebc48509ae271b149fe1a2bf86b3b5438cb3a9610fe026606db068
3
+ size 41929069
s3dis-semseg-pt-v3m1-0-rpe/config.py ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 25326354
6
+ save_path = 'exp/s3dis/semseg-pt-v3m1-0-rpe'
7
+ num_worker = 24
8
+ batch_size = 12
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 3000
12
+ eval_epoch = 100
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = False
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0006)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='DefaultTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='DefaultSegmentorV2',
31
+ num_classes=13,
32
+ backbone_out_channels=64,
33
+ backbone=dict(
34
+ type='PT-v3m1',
35
+ in_channels=6,
36
+ order=['z', 'z-trans', 'hilbert', 'hilbert-trans'],
37
+ stride=(2, 2, 2, 2),
38
+ enc_depths=(2, 2, 2, 6, 2),
39
+ enc_channels=(32, 64, 128, 256, 512),
40
+ enc_num_head=(2, 4, 8, 16, 32),
41
+ enc_patch_size=(128, 128, 128, 128, 128),
42
+ dec_depths=(2, 2, 2, 2),
43
+ dec_channels=(64, 64, 128, 256),
44
+ dec_num_head=(4, 4, 8, 16),
45
+ dec_patch_size=(128, 128, 128, 128),
46
+ mlp_ratio=4,
47
+ qkv_bias=True,
48
+ qk_scale=None,
49
+ attn_drop=0.0,
50
+ proj_drop=0.0,
51
+ drop_path=0.3,
52
+ shuffle_orders=True,
53
+ pre_norm=True,
54
+ enable_rpe=True,
55
+ enable_flash=False,
56
+ upcast_attention=True,
57
+ upcast_softmax=True,
58
+ cls_mode=False,
59
+ pdnorm_bn=False,
60
+ pdnorm_ln=False,
61
+ pdnorm_decouple=True,
62
+ pdnorm_adaptive=False,
63
+ pdnorm_affine=True,
64
+ pdnorm_conditions=('ScanNet', 'S3DIS', 'Structured3D')),
65
+ criteria=[
66
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
67
+ dict(
68
+ type='LovaszLoss',
69
+ mode='multiclass',
70
+ loss_weight=1.0,
71
+ ignore_index=-1)
72
+ ])
73
+ optimizer = dict(type='AdamW', lr=0.006, weight_decay=0.05)
74
+ scheduler = dict(
75
+ type='OneCycleLR',
76
+ max_lr=[0.006, 0.0006],
77
+ pct_start=0.05,
78
+ anneal_strategy='cos',
79
+ div_factor=10.0,
80
+ final_div_factor=1000.0)
81
+ dataset_type = 'S3DISDataset'
82
+ data_root = 'data/s3dis'
83
+ data = dict(
84
+ num_classes=13,
85
+ ignore_index=-1,
86
+ names=[
87
+ 'ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
88
+ 'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter'
89
+ ],
90
+ train=dict(
91
+ type='S3DISDataset',
92
+ split=('Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_6'),
93
+ data_root='data/s3dis',
94
+ transform=[
95
+ dict(type='CenterShift', apply_z=True),
96
+ dict(
97
+ type='RandomDropout',
98
+ dropout_ratio=0.2,
99
+ dropout_application_ratio=0.2),
100
+ dict(
101
+ type='RandomRotate',
102
+ angle=[-1, 1],
103
+ axis='z',
104
+ center=[0, 0, 0],
105
+ p=0.5),
106
+ dict(
107
+ type='RandomRotate',
108
+ angle=[-0.015625, 0.015625],
109
+ axis='x',
110
+ p=0.5),
111
+ dict(
112
+ type='RandomRotate',
113
+ angle=[-0.015625, 0.015625],
114
+ axis='y',
115
+ p=0.5),
116
+ dict(type='RandomScale', scale=[0.9, 1.1]),
117
+ dict(type='RandomFlip', p=0.5),
118
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
119
+ dict(type='ChromaticAutoContrast', p=0.2, blend_factor=None),
120
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
121
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
122
+ dict(
123
+ type='GridSample',
124
+ grid_size=0.02,
125
+ hash_type='fnv',
126
+ mode='train',
127
+ return_grid_coord=True),
128
+ dict(type='SphereCrop', sample_rate=0.6, mode='random'),
129
+ dict(type='SphereCrop', point_max=204800, mode='random'),
130
+ dict(type='CenterShift', apply_z=False),
131
+ dict(type='NormalizeColor'),
132
+ dict(type='ToTensor'),
133
+ dict(
134
+ type='Collect',
135
+ keys=('coord', 'grid_coord', 'segment'),
136
+ feat_keys=('color', 'normal'))
137
+ ],
138
+ test_mode=False,
139
+ loop=30),
140
+ val=dict(
141
+ type='S3DISDataset',
142
+ split='Area_5',
143
+ data_root='data/s3dis',
144
+ transform=[
145
+ dict(type='CenterShift', apply_z=True),
146
+ dict(
147
+ type='Copy',
148
+ keys_dict=dict(coord='origin_coord',
149
+ segment='origin_segment')),
150
+ dict(
151
+ type='GridSample',
152
+ grid_size=0.02,
153
+ hash_type='fnv',
154
+ mode='train',
155
+ return_grid_coord=True),
156
+ dict(type='CenterShift', apply_z=False),
157
+ dict(type='NormalizeColor'),
158
+ dict(type='ToTensor'),
159
+ dict(
160
+ type='Collect',
161
+ keys=('coord', 'grid_coord', 'origin_coord', 'segment',
162
+ 'origin_segment'),
163
+ offset_keys_dict=dict(
164
+ offset='coord', origin_offset='origin_coord'),
165
+ feat_keys=('color', 'normal'))
166
+ ],
167
+ test_mode=False),
168
+ test=dict(
169
+ type='S3DISDataset',
170
+ split='Area_5',
171
+ data_root='data/s3dis',
172
+ transform=[
173
+ dict(type='CenterShift', apply_z=True),
174
+ dict(type='NormalizeColor')
175
+ ],
176
+ test_mode=True,
177
+ test_cfg=dict(
178
+ voxelize=dict(
179
+ type='GridSample',
180
+ grid_size=0.02,
181
+ hash_type='fnv',
182
+ mode='test',
183
+ keys=('coord', 'color', 'normal'),
184
+ return_grid_coord=True),
185
+ crop=None,
186
+ post_transform=[
187
+ dict(type='CenterShift', apply_z=False),
188
+ dict(type='ToTensor'),
189
+ dict(
190
+ type='Collect',
191
+ keys=('coord', 'grid_coord', 'index'),
192
+ feat_keys=('color', 'normal'))
193
+ ],
194
+ aug_transform=[[{
195
+ 'type': 'RandomScale',
196
+ 'scale': [0.9, 0.9]
197
+ }], [{
198
+ 'type': 'RandomScale',
199
+ 'scale': [0.95, 0.95]
200
+ }], [{
201
+ 'type': 'RandomScale',
202
+ 'scale': [1, 1]
203
+ }], [{
204
+ 'type': 'RandomScale',
205
+ 'scale': [1.05, 1.05]
206
+ }], [{
207
+ 'type': 'RandomScale',
208
+ 'scale': [1.1, 1.1]
209
+ }],
210
+ [{
211
+ 'type': 'RandomScale',
212
+ 'scale': [0.9, 0.9]
213
+ }, {
214
+ 'type': 'RandomFlip',
215
+ 'p': 1
216
+ }],
217
+ [{
218
+ 'type': 'RandomScale',
219
+ 'scale': [0.95, 0.95]
220
+ }, {
221
+ 'type': 'RandomFlip',
222
+ 'p': 1
223
+ }],
224
+ [{
225
+ 'type': 'RandomScale',
226
+ 'scale': [1, 1]
227
+ }, {
228
+ 'type': 'RandomFlip',
229
+ 'p': 1
230
+ }],
231
+ [{
232
+ 'type': 'RandomScale',
233
+ 'scale': [1.05, 1.05]
234
+ }, {
235
+ 'type': 'RandomFlip',
236
+ 'p': 1
237
+ }],
238
+ [{
239
+ 'type': 'RandomScale',
240
+ 'scale': [1.1, 1.1]
241
+ }, {
242
+ 'type': 'RandomFlip',
243
+ 'p': 1
244
+ }]])))
s3dis-semseg-pt-v3m1-0-rpe/events.out.tfevents.1703439768.s3dis-semseg-pt-v3m1-0-rpe ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbc8fe362763a5f22fcae4a88544c410fe13d450a8575c72b2e3876b9c0c9cce
3
+ size 4988420
s3dis-semseg-pt-v3m1-0-rpe/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef2c6232b34f73e76c88d7b151e24294b2e8e61536428fd48f8f81371e796b69
3
+ size 554922616
s3dis-semseg-pt-v3m1-0-rpe/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3386f90bd37089d8dae0c1b932bb5b79339bbdae536249677df88959642099ab
3
+ size 554922616
s3dis-semseg-pt-v3m1-0-rpe/train.log ADDED
The diff for this file is too large to render. See raw diff
 
s3dis-semseg-pt-v3m1-1-ppt-extreme/config.py ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 36123202
6
+ save_path = 'exp/s3dis/semseg-pt-v3m1-1-ppt-extreme'
7
+ num_worker = 48
8
+ batch_size = 24
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 100
12
+ eval_epoch = 100
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = True
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0005)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='MultiDatasetTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='PPT-v1m1',
31
+ backbone=dict(
32
+ type='PT-v3m1',
33
+ in_channels=6,
34
+ order=('z', 'z-trans', 'hilbert', 'hilbert-trans'),
35
+ stride=(2, 2, 2, 2),
36
+ enc_depths=(2, 2, 2, 6, 2),
37
+ enc_channels=(32, 64, 128, 256, 384),
38
+ enc_num_head=(2, 4, 8, 16, 24),
39
+ enc_patch_size=(128, 128, 128, 128, 128),
40
+ dec_depths=(2, 2, 2, 2),
41
+ dec_channels=(64, 64, 128, 256),
42
+ dec_num_head=(4, 4, 8, 16),
43
+ dec_patch_size=(128, 128, 128, 128),
44
+ mlp_ratio=4,
45
+ qkv_bias=True,
46
+ qk_scale=None,
47
+ attn_drop=0.0,
48
+ proj_drop=0.0,
49
+ drop_path=0.3,
50
+ shuffle_orders=True,
51
+ pre_norm=True,
52
+ enable_rpe=True,
53
+ enable_flash=False,
54
+ upcast_attention=True,
55
+ upcast_softmax=True,
56
+ cls_mode=False,
57
+ pdnorm_bn=True,
58
+ pdnorm_ln=True,
59
+ pdnorm_decouple=True,
60
+ pdnorm_adaptive=False,
61
+ pdnorm_affine=True,
62
+ pdnorm_conditions=('ScanNet', 'S3DIS', 'Structured3D')),
63
+ criteria=[
64
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
65
+ dict(
66
+ type='LovaszLoss',
67
+ mode='multiclass',
68
+ loss_weight=1.0,
69
+ ignore_index=-1)
70
+ ],
71
+ backbone_out_channels=64,
72
+ context_channels=256,
73
+ conditions=('Structured3D', 'ScanNet', 'S3DIS'),
74
+ template='[x]',
75
+ clip_model='ViT-B/16',
76
+ class_name=('wall', 'floor', 'cabinet', 'bed', 'chair', 'sofa', 'table',
77
+ 'door', 'window', 'bookshelf', 'bookcase', 'picture',
78
+ 'counter', 'desk', 'shelves', 'curtain', 'dresser', 'pillow',
79
+ 'mirror', 'ceiling', 'refrigerator', 'television',
80
+ 'shower curtain', 'nightstand', 'toilet', 'sink', 'lamp',
81
+ 'bathtub', 'garbagebin', 'board', 'beam', 'column', 'clutter',
82
+ 'otherstructure', 'otherfurniture', 'otherprop'),
83
+ valid_index=((0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 13, 14, 15, 16, 17, 18, 19,
84
+ 20, 21, 23, 25, 26, 33, 34, 35),
85
+ (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 20, 22, 24, 25,
86
+ 27, 34), (0, 1, 4, 5, 6, 7, 8, 10, 19, 29, 30, 31, 32)),
87
+ backbone_mode=False)
88
+ optimizer = dict(type='AdamW', lr=0.005, weight_decay=0.05)
89
+ scheduler = dict(
90
+ type='OneCycleLR',
91
+ max_lr=[0.005, 0.0005],
92
+ pct_start=0.05,
93
+ anneal_strategy='cos',
94
+ div_factor=10.0,
95
+ final_div_factor=1000.0)
96
+ data = dict(
97
+ num_classes=13,
98
+ ignore_index=-1,
99
+ names=[
100
+ 'ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
101
+ 'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter'
102
+ ],
103
+ train=dict(
104
+ type='ConcatDataset',
105
+ datasets=[
106
+ dict(
107
+ type='Structured3DDataset',
108
+ split=['train', 'val', 'test'],
109
+ data_root='data/structured3d',
110
+ transform=[
111
+ dict(type='CenterShift', apply_z=True),
112
+ dict(
113
+ type='RandomDropout',
114
+ dropout_ratio=0.2,
115
+ dropout_application_ratio=0.2),
116
+ dict(
117
+ type='RandomRotate',
118
+ angle=[-1, 1],
119
+ axis='z',
120
+ center=[0, 0, 0],
121
+ p=0.5),
122
+ dict(
123
+ type='RandomRotate',
124
+ angle=[-0.015625, 0.015625],
125
+ axis='x',
126
+ p=0.5),
127
+ dict(
128
+ type='RandomRotate',
129
+ angle=[-0.015625, 0.015625],
130
+ axis='y',
131
+ p=0.5),
132
+ dict(type='RandomScale', scale=[0.9, 1.1]),
133
+ dict(type='RandomFlip', p=0.5),
134
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
135
+ dict(
136
+ type='ChromaticAutoContrast', p=0.2,
137
+ blend_factor=None),
138
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
139
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
140
+ dict(
141
+ type='GridSample',
142
+ grid_size=0.02,
143
+ hash_type='fnv',
144
+ mode='train',
145
+ return_grid_coord=True),
146
+ dict(type='SphereCrop', sample_rate=0.8, mode='random'),
147
+ dict(type='SphereCrop', point_max=204800, mode='random'),
148
+ dict(type='CenterShift', apply_z=False),
149
+ dict(type='NormalizeColor'),
150
+ dict(type='Add', keys_dict=dict(condition='Structured3D')),
151
+ dict(type='ToTensor'),
152
+ dict(
153
+ type='Collect',
154
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
155
+ feat_keys=('color', 'normal'))
156
+ ],
157
+ test_mode=False,
158
+ loop=4),
159
+ dict(
160
+ type='ScanNetDataset',
161
+ split='train',
162
+ data_root='data/scannet',
163
+ transform=[
164
+ dict(type='CenterShift', apply_z=True),
165
+ dict(
166
+ type='RandomDropout',
167
+ dropout_ratio=0.2,
168
+ dropout_application_ratio=0.2),
169
+ dict(
170
+ type='RandomRotate',
171
+ angle=[-1, 1],
172
+ axis='z',
173
+ center=[0, 0, 0],
174
+ p=0.5),
175
+ dict(
176
+ type='RandomRotate',
177
+ angle=[-0.015625, 0.015625],
178
+ axis='x',
179
+ p=0.5),
180
+ dict(
181
+ type='RandomRotate',
182
+ angle=[-0.015625, 0.015625],
183
+ axis='y',
184
+ p=0.5),
185
+ dict(type='RandomScale', scale=[0.9, 1.1]),
186
+ dict(type='RandomFlip', p=0.5),
187
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
188
+ dict(
189
+ type='ChromaticAutoContrast', p=0.2,
190
+ blend_factor=None),
191
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
192
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
193
+ dict(
194
+ type='GridSample',
195
+ grid_size=0.02,
196
+ hash_type='fnv',
197
+ mode='train',
198
+ return_grid_coord=True),
199
+ dict(type='SphereCrop', point_max=102400, mode='random'),
200
+ dict(type='CenterShift', apply_z=False),
201
+ dict(type='NormalizeColor'),
202
+ dict(type='Add', keys_dict=dict(condition='ScanNet')),
203
+ dict(type='ToTensor'),
204
+ dict(
205
+ type='Collect',
206
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
207
+ feat_keys=('color', 'normal'))
208
+ ],
209
+ test_mode=False,
210
+ loop=2),
211
+ dict(
212
+ type='S3DISDataset',
213
+ split=('Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_6'),
214
+ data_root='data/s3dis',
215
+ transform=[
216
+ dict(type='CenterShift', apply_z=True),
217
+ dict(
218
+ type='RandomDropout',
219
+ dropout_ratio=0.2,
220
+ dropout_application_ratio=0.2),
221
+ dict(
222
+ type='RandomRotate',
223
+ angle=[-1, 1],
224
+ axis='z',
225
+ center=[0, 0, 0],
226
+ p=0.5),
227
+ dict(
228
+ type='RandomRotate',
229
+ angle=[-0.015625, 0.015625],
230
+ axis='x',
231
+ p=0.5),
232
+ dict(
233
+ type='RandomRotate',
234
+ angle=[-0.015625, 0.015625],
235
+ axis='y',
236
+ p=0.5),
237
+ dict(type='RandomScale', scale=[0.9, 1.1]),
238
+ dict(type='RandomFlip', p=0.5),
239
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
240
+ dict(
241
+ type='ChromaticAutoContrast', p=0.2,
242
+ blend_factor=None),
243
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
244
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
245
+ dict(
246
+ type='GridSample',
247
+ grid_size=0.02,
248
+ hash_type='fnv',
249
+ mode='train',
250
+ return_grid_coord=True),
251
+ dict(type='SphereCrop', sample_rate=0.6, mode='random'),
252
+ dict(type='SphereCrop', point_max=204800, mode='random'),
253
+ dict(type='CenterShift', apply_z=False),
254
+ dict(type='NormalizeColor'),
255
+ dict(type='Add', keys_dict=dict(condition='S3DIS')),
256
+ dict(type='ToTensor'),
257
+ dict(
258
+ type='Collect',
259
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
260
+ feat_keys=('color', 'normal'))
261
+ ],
262
+ test_mode=False,
263
+ loop=1)
264
+ ],
265
+ loop=1),
266
+ val=dict(
267
+ type='S3DISDataset',
268
+ split='Area_5',
269
+ data_root='data/s3dis',
270
+ transform=[
271
+ dict(type='CenterShift', apply_z=True),
272
+ dict(
273
+ type='Copy',
274
+ keys_dict=dict(coord='origin_coord',
275
+ segment='origin_segment')),
276
+ dict(
277
+ type='GridSample',
278
+ grid_size=0.02,
279
+ hash_type='fnv',
280
+ mode='train',
281
+ return_grid_coord=True),
282
+ dict(type='CenterShift', apply_z=False),
283
+ dict(type='NormalizeColor'),
284
+ dict(type='ToTensor'),
285
+ dict(type='Add', keys_dict=dict(condition='S3DIS')),
286
+ dict(
287
+ type='Collect',
288
+ keys=('coord', 'grid_coord', 'origin_coord', 'segment',
289
+ 'origin_segment', 'condition'),
290
+ offset_keys_dict=dict(
291
+ offset='coord', origin_offset='origin_coord'),
292
+ feat_keys=('color', 'normal'))
293
+ ],
294
+ test_mode=False),
295
+ test=dict(
296
+ type='S3DISDataset',
297
+ split='Area_5',
298
+ data_root='data/s3dis',
299
+ transform=[
300
+ dict(type='CenterShift', apply_z=True),
301
+ dict(type='NormalizeColor')
302
+ ],
303
+ test_mode=True,
304
+ test_cfg=dict(
305
+ voxelize=dict(
306
+ type='GridSample',
307
+ grid_size=0.02,
308
+ hash_type='fnv',
309
+ mode='test',
310
+ keys=('coord', 'color', 'normal'),
311
+ return_grid_coord=True),
312
+ crop=None,
313
+ post_transform=[
314
+ dict(type='CenterShift', apply_z=False),
315
+ dict(type='Add', keys_dict=dict(condition='S3DIS')),
316
+ dict(type='ToTensor'),
317
+ dict(
318
+ type='Collect',
319
+ keys=('coord', 'grid_coord', 'index', 'condition'),
320
+ feat_keys=('color', 'normal'))
321
+ ],
322
+ aug_transform=[[{
323
+ 'type': 'RandomRotateTargetAngle',
324
+ 'angle': [0],
325
+ 'axis': 'z',
326
+ 'center': [0, 0, 0],
327
+ 'p': 1
328
+ }],
329
+ [{
330
+ 'type': 'RandomRotateTargetAngle',
331
+ 'angle': [0.5],
332
+ 'axis': 'z',
333
+ 'center': [0, 0, 0],
334
+ 'p': 1
335
+ }],
336
+ [{
337
+ 'type': 'RandomRotateTargetAngle',
338
+ 'angle': [1],
339
+ 'axis': 'z',
340
+ 'center': [0, 0, 0],
341
+ 'p': 1
342
+ }],
343
+ [{
344
+ 'type': 'RandomRotateTargetAngle',
345
+ 'angle': [1.5],
346
+ 'axis': 'z',
347
+ 'center': [0, 0, 0],
348
+ 'p': 1
349
+ }],
350
+ [{
351
+ 'type': 'RandomRotateTargetAngle',
352
+ 'angle': [0],
353
+ 'axis': 'z',
354
+ 'center': [0, 0, 0],
355
+ 'p': 1
356
+ }, {
357
+ 'type': 'RandomScale',
358
+ 'scale': [0.95, 0.95]
359
+ }],
360
+ [{
361
+ 'type': 'RandomRotateTargetAngle',
362
+ 'angle': [0.5],
363
+ 'axis': 'z',
364
+ 'center': [0, 0, 0],
365
+ 'p': 1
366
+ }, {
367
+ 'type': 'RandomScale',
368
+ 'scale': [0.95, 0.95]
369
+ }],
370
+ [{
371
+ 'type': 'RandomRotateTargetAngle',
372
+ 'angle': [1],
373
+ 'axis': 'z',
374
+ 'center': [0, 0, 0],
375
+ 'p': 1
376
+ }, {
377
+ 'type': 'RandomScale',
378
+ 'scale': [0.95, 0.95]
379
+ }],
380
+ [{
381
+ 'type': 'RandomRotateTargetAngle',
382
+ 'angle': [1.5],
383
+ 'axis': 'z',
384
+ 'center': [0, 0, 0],
385
+ 'p': 1
386
+ }, {
387
+ 'type': 'RandomScale',
388
+ 'scale': [0.95, 0.95]
389
+ }],
390
+ [{
391
+ 'type': 'RandomRotateTargetAngle',
392
+ 'angle': [0],
393
+ 'axis': 'z',
394
+ 'center': [0, 0, 0],
395
+ 'p': 1
396
+ }, {
397
+ 'type': 'RandomScale',
398
+ 'scale': [1.05, 1.05]
399
+ }],
400
+ [{
401
+ 'type': 'RandomRotateTargetAngle',
402
+ 'angle': [0.5],
403
+ 'axis': 'z',
404
+ 'center': [0, 0, 0],
405
+ 'p': 1
406
+ }, {
407
+ 'type': 'RandomScale',
408
+ 'scale': [1.05, 1.05]
409
+ }],
410
+ [{
411
+ 'type': 'RandomRotateTargetAngle',
412
+ 'angle': [1],
413
+ 'axis': 'z',
414
+ 'center': [0, 0, 0],
415
+ 'p': 1
416
+ }, {
417
+ 'type': 'RandomScale',
418
+ 'scale': [1.05, 1.05]
419
+ }],
420
+ [{
421
+ 'type': 'RandomRotateTargetAngle',
422
+ 'angle': [1.5],
423
+ 'axis': 'z',
424
+ 'center': [0, 0, 0],
425
+ 'p': 1
426
+ }, {
427
+ 'type': 'RandomScale',
428
+ 'scale': [1.05, 1.05]
429
+ }], [{
430
+ 'type': 'RandomFlip',
431
+ 'p': 1
432
+ }]])))
s3dis-semseg-pt-v3m1-1-ppt-extreme/events.out.tfevents.1708160591.s3dis-semseg-pt-v3m1-1-ppt-extreme ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f9235ae197b5ab037f76568dfc5e7001f1531674dfc560d5e8549aec44e7a0c
3
+ size 15249020
s3dis-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7725bc5c847f7b5492b244a85bd5c0725d638b30e2dada2db01e4cc5a4f5e019
3
+ size 445523390
s3dis-semseg-pt-v3m1-1-ppt-extreme/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26242bc8dbe3449d9754587e672c4b626dde66bca971804224bdded27c4fce8e
3
+ size 445523390
s3dis-semseg-pt-v3m1-1-ppt-extreme/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92483bd7093a08b9a3a4ea07036c3c935c403b61cf8c15b89116b2baab980ffb
3
+ size 25517235
scannet-semseg-pt-v3m1-0-base/config.py ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 43244662
6
+ save_path = 'exp/scannet/semseg-pt-v3m1-0-base'
7
+ num_worker = 24
8
+ batch_size = 12
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 800
12
+ eval_epoch = 100
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = False
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0006)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='DefaultTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='DefaultSegmentorV2',
31
+ num_classes=20,
32
+ backbone_out_channels=64,
33
+ backbone=dict(
34
+ type='PT-v3m1',
35
+ in_channels=6,
36
+ order=['z', 'z-trans', 'hilbert', 'hilbert-trans'],
37
+ stride=(2, 2, 2, 2),
38
+ enc_depths=(2, 2, 2, 6, 2),
39
+ enc_channels=(32, 64, 128, 256, 512),
40
+ enc_num_head=(2, 4, 8, 16, 32),
41
+ enc_patch_size=(1024, 1024, 1024, 1024, 1024),
42
+ dec_depths=(2, 2, 2, 2),
43
+ dec_channels=(64, 64, 128, 256),
44
+ dec_num_head=(4, 4, 8, 16),
45
+ dec_patch_size=(1024, 1024, 1024, 1024),
46
+ mlp_ratio=4,
47
+ qkv_bias=True,
48
+ qk_scale=None,
49
+ attn_drop=0.0,
50
+ proj_drop=0.0,
51
+ drop_path=0.3,
52
+ shuffle_orders=True,
53
+ pre_norm=True,
54
+ enable_rpe=False,
55
+ enable_flash=True,
56
+ upcast_attention=False,
57
+ upcast_softmax=False,
58
+ cls_mode=False,
59
+ pdnorm_bn=False,
60
+ pdnorm_ln=False,
61
+ pdnorm_decouple=True,
62
+ pdnorm_adaptive=False,
63
+ pdnorm_affine=True,
64
+ pdnorm_conditions=('ScanNet', 'S3DIS', 'Structured3D')),
65
+ criteria=[
66
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
67
+ dict(
68
+ type='LovaszLoss',
69
+ mode='multiclass',
70
+ loss_weight=1.0,
71
+ ignore_index=-1)
72
+ ])
73
+ optimizer = dict(type='AdamW', lr=0.006, weight_decay=0.05)
74
+ scheduler = dict(
75
+ type='OneCycleLR',
76
+ max_lr=[0.006, 0.0006],
77
+ pct_start=0.05,
78
+ anneal_strategy='cos',
79
+ div_factor=10.0,
80
+ final_div_factor=1000.0)
81
+ dataset_type = 'ScanNetDataset'
82
+ data_root = 'data/scannet'
83
+ data = dict(
84
+ num_classes=20,
85
+ ignore_index=-1,
86
+ names=[
87
+ 'wall', 'floor', 'cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
88
+ 'window', 'bookshelf', 'picture', 'counter', 'desk', 'curtain',
89
+ 'refridgerator', 'shower curtain', 'toilet', 'sink', 'bathtub',
90
+ 'otherfurniture'
91
+ ],
92
+ train=dict(
93
+ type='ScanNetDataset',
94
+ split='train',
95
+ data_root='data/scannet',
96
+ transform=[
97
+ dict(type='CenterShift', apply_z=True),
98
+ dict(
99
+ type='RandomDropout',
100
+ dropout_ratio=0.2,
101
+ dropout_application_ratio=0.2),
102
+ dict(
103
+ type='RandomRotate',
104
+ angle=[-1, 1],
105
+ axis='z',
106
+ center=[0, 0, 0],
107
+ p=0.5),
108
+ dict(
109
+ type='RandomRotate',
110
+ angle=[-0.015625, 0.015625],
111
+ axis='x',
112
+ p=0.5),
113
+ dict(
114
+ type='RandomRotate',
115
+ angle=[-0.015625, 0.015625],
116
+ axis='y',
117
+ p=0.5),
118
+ dict(type='RandomScale', scale=[0.9, 1.1]),
119
+ dict(type='RandomFlip', p=0.5),
120
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
121
+ dict(
122
+ type='ElasticDistortion',
123
+ distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
124
+ dict(type='ChromaticAutoContrast', p=0.2, blend_factor=None),
125
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
126
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
127
+ dict(
128
+ type='GridSample',
129
+ grid_size=0.02,
130
+ hash_type='fnv',
131
+ mode='train',
132
+ return_grid_coord=True),
133
+ dict(type='SphereCrop', point_max=102400, mode='random'),
134
+ dict(type='CenterShift', apply_z=False),
135
+ dict(type='NormalizeColor'),
136
+ dict(type='ToTensor'),
137
+ dict(
138
+ type='Collect',
139
+ keys=('coord', 'grid_coord', 'segment'),
140
+ feat_keys=('color', 'normal'))
141
+ ],
142
+ test_mode=False,
143
+ loop=8),
144
+ val=dict(
145
+ type='ScanNetDataset',
146
+ split='val',
147
+ data_root='data/scannet',
148
+ transform=[
149
+ dict(type='CenterShift', apply_z=True),
150
+ dict(
151
+ type='GridSample',
152
+ grid_size=0.02,
153
+ hash_type='fnv',
154
+ mode='train',
155
+ return_grid_coord=True),
156
+ dict(type='CenterShift', apply_z=False),
157
+ dict(type='NormalizeColor'),
158
+ dict(type='ToTensor'),
159
+ dict(
160
+ type='Collect',
161
+ keys=('coord', 'grid_coord', 'segment'),
162
+ feat_keys=('color', 'normal'))
163
+ ],
164
+ test_mode=False),
165
+ test=dict(
166
+ type='ScanNetDataset',
167
+ split='val',
168
+ data_root='data/scannet',
169
+ transform=[
170
+ dict(type='CenterShift', apply_z=True),
171
+ dict(type='NormalizeColor')
172
+ ],
173
+ test_mode=True,
174
+ test_cfg=dict(
175
+ voxelize=dict(
176
+ type='GridSample',
177
+ grid_size=0.02,
178
+ hash_type='fnv',
179
+ mode='test',
180
+ keys=('coord', 'color', 'normal'),
181
+ return_grid_coord=True),
182
+ crop=None,
183
+ post_transform=[
184
+ dict(type='CenterShift', apply_z=False),
185
+ dict(type='ToTensor'),
186
+ dict(
187
+ type='Collect',
188
+ keys=('coord', 'grid_coord', 'index'),
189
+ feat_keys=('color', 'normal'))
190
+ ],
191
+ aug_transform=[[{
192
+ 'type': 'RandomRotateTargetAngle',
193
+ 'angle': [0],
194
+ 'axis': 'z',
195
+ 'center': [0, 0, 0],
196
+ 'p': 1
197
+ }],
198
+ [{
199
+ 'type': 'RandomRotateTargetAngle',
200
+ 'angle': [0.5],
201
+ 'axis': 'z',
202
+ 'center': [0, 0, 0],
203
+ 'p': 1
204
+ }],
205
+ [{
206
+ 'type': 'RandomRotateTargetAngle',
207
+ 'angle': [1],
208
+ 'axis': 'z',
209
+ 'center': [0, 0, 0],
210
+ 'p': 1
211
+ }],
212
+ [{
213
+ 'type': 'RandomRotateTargetAngle',
214
+ 'angle': [1.5],
215
+ 'axis': 'z',
216
+ 'center': [0, 0, 0],
217
+ 'p': 1
218
+ }],
219
+ [{
220
+ 'type': 'RandomRotateTargetAngle',
221
+ 'angle': [0],
222
+ 'axis': 'z',
223
+ 'center': [0, 0, 0],
224
+ 'p': 1
225
+ }, {
226
+ 'type': 'RandomScale',
227
+ 'scale': [0.95, 0.95]
228
+ }],
229
+ [{
230
+ 'type': 'RandomRotateTargetAngle',
231
+ 'angle': [0.5],
232
+ 'axis': 'z',
233
+ 'center': [0, 0, 0],
234
+ 'p': 1
235
+ }, {
236
+ 'type': 'RandomScale',
237
+ 'scale': [0.95, 0.95]
238
+ }],
239
+ [{
240
+ 'type': 'RandomRotateTargetAngle',
241
+ 'angle': [1],
242
+ 'axis': 'z',
243
+ 'center': [0, 0, 0],
244
+ 'p': 1
245
+ }, {
246
+ 'type': 'RandomScale',
247
+ 'scale': [0.95, 0.95]
248
+ }],
249
+ [{
250
+ 'type': 'RandomRotateTargetAngle',
251
+ 'angle': [1.5],
252
+ 'axis': 'z',
253
+ 'center': [0, 0, 0],
254
+ 'p': 1
255
+ }, {
256
+ 'type': 'RandomScale',
257
+ 'scale': [0.95, 0.95]
258
+ }],
259
+ [{
260
+ 'type': 'RandomRotateTargetAngle',
261
+ 'angle': [0],
262
+ 'axis': 'z',
263
+ 'center': [0, 0, 0],
264
+ 'p': 1
265
+ }, {
266
+ 'type': 'RandomScale',
267
+ 'scale': [1.05, 1.05]
268
+ }],
269
+ [{
270
+ 'type': 'RandomRotateTargetAngle',
271
+ 'angle': [0.5],
272
+ 'axis': 'z',
273
+ 'center': [0, 0, 0],
274
+ 'p': 1
275
+ }, {
276
+ 'type': 'RandomScale',
277
+ 'scale': [1.05, 1.05]
278
+ }],
279
+ [{
280
+ 'type': 'RandomRotateTargetAngle',
281
+ 'angle': [1],
282
+ 'axis': 'z',
283
+ 'center': [0, 0, 0],
284
+ 'p': 1
285
+ }, {
286
+ 'type': 'RandomScale',
287
+ 'scale': [1.05, 1.05]
288
+ }],
289
+ [{
290
+ 'type': 'RandomRotateTargetAngle',
291
+ 'angle': [1.5],
292
+ 'axis': 'z',
293
+ 'center': [0, 0, 0],
294
+ 'p': 1
295
+ }, {
296
+ 'type': 'RandomScale',
297
+ 'scale': [1.05, 1.05]
298
+ }], [{
299
+ 'type': 'RandomFlip',
300
+ 'p': 1
301
+ }]])))
scannet-semseg-pt-v3m1-0-base/events.out.tfevents.1703049730.scannet-semseg-pt-v3m1-0-base ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9b318c5aa9d8fb1b0dde05c04cf8cd51cc22e08b4f2bac241cd790f021de75d
3
+ size 7830420
scannet-semseg-pt-v3m1-0-base/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40206376ff2f83f48e4d1bc27d5c5d96be7c87c5d11eb45fe7be501959040e7f
3
+ size 554618088
scannet-semseg-pt-v3m1-0-base/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e33d7718f6a9a52465a43e6122d7873630bcd10b626a3452199d2b36006a4ec
3
+ size 554618088
scannet-semseg-pt-v3m1-0-base/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cac65e2f316033b21d79cb89f2ee700f41a388b5dbe85bf910bd303549ab01b1
3
+ size 14778054
scannet-semseg-pt-v3m1-1-ppt-extreme/config.py ADDED
@@ -0,0 +1,381 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = 'exp/scannet/semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth'
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 44350923
6
+ save_path = 'exp/scannet/semseg-pt-v3m1-1-ppt-extreme'
7
+ num_worker = 48
8
+ batch_size = 24
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 100
12
+ eval_epoch = 100
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = True
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0005)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='MultiDatasetTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='PPT-v1m1',
31
+ backbone=dict(
32
+ type='PT-v3m1',
33
+ in_channels=6,
34
+ order=('z', 'z-trans', 'hilbert', 'hilbert-trans'),
35
+ stride=(2, 2, 2, 2),
36
+ enc_depths=(3, 3, 3, 6, 3),
37
+ enc_channels=(48, 96, 192, 384, 512),
38
+ enc_num_head=(3, 6, 12, 24, 32),
39
+ enc_patch_size=(1024, 1024, 1024, 1024, 1024),
40
+ dec_depths=(3, 3, 3, 3),
41
+ dec_channels=(64, 96, 192, 384),
42
+ dec_num_head=(4, 6, 12, 24),
43
+ dec_patch_size=(1024, 1024, 1024, 1024),
44
+ mlp_ratio=4,
45
+ qkv_bias=True,
46
+ qk_scale=None,
47
+ attn_drop=0.0,
48
+ proj_drop=0.0,
49
+ drop_path=0.3,
50
+ shuffle_orders=True,
51
+ pre_norm=True,
52
+ enable_rpe=False,
53
+ enable_flash=True,
54
+ upcast_attention=False,
55
+ upcast_softmax=False,
56
+ cls_mode=False,
57
+ pdnorm_bn=True,
58
+ pdnorm_ln=True,
59
+ pdnorm_decouple=True,
60
+ pdnorm_adaptive=False,
61
+ pdnorm_affine=True,
62
+ pdnorm_conditions=('ScanNet', 'S3DIS', 'Structured3D')),
63
+ criteria=[
64
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
65
+ dict(
66
+ type='LovaszLoss',
67
+ mode='multiclass',
68
+ loss_weight=1.0,
69
+ ignore_index=-1)
70
+ ],
71
+ backbone_out_channels=64,
72
+ context_channels=256,
73
+ conditions=('Structured3D', 'ScanNet', 'S3DIS'),
74
+ template='[x]',
75
+ clip_model='ViT-B/16',
76
+ class_name=('wall', 'floor', 'cabinet', 'bed', 'chair', 'sofa', 'table',
77
+ 'door', 'window', 'bookshelf', 'bookcase', 'picture',
78
+ 'counter', 'desk', 'shelves', 'curtain', 'dresser', 'pillow',
79
+ 'mirror', 'ceiling', 'refrigerator', 'television',
80
+ 'shower curtain', 'nightstand', 'toilet', 'sink', 'lamp',
81
+ 'bathtub', 'garbagebin', 'board', 'beam', 'column', 'clutter',
82
+ 'otherstructure', 'otherfurniture', 'otherprop'),
83
+ valid_index=((0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 13, 14, 15, 16, 17, 18, 19,
84
+ 20, 21, 23, 25, 26, 33, 34, 35),
85
+ (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 20, 22, 24, 25,
86
+ 27, 34), (0, 1, 4, 5, 6, 7, 8, 10, 19, 29, 30, 31, 32)),
87
+ backbone_mode=False)
88
+ optimizer = dict(type='AdamW', lr=0.005, weight_decay=0.05)
89
+ scheduler = dict(
90
+ type='OneCycleLR',
91
+ max_lr=[0.005, 0.0005],
92
+ pct_start=0.05,
93
+ anneal_strategy='cos',
94
+ div_factor=10.0,
95
+ final_div_factor=1000.0)
96
+ data = dict(
97
+ num_classes=20,
98
+ ignore_index=-1,
99
+ names=[
100
+ 'wall', 'floor', 'cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
101
+ 'window', 'bookshelf', 'picture', 'counter', 'desk', 'curtain',
102
+ 'refridgerator', 'shower curtain', 'toilet', 'sink', 'bathtub',
103
+ 'otherfurniture'
104
+ ],
105
+ train=dict(
106
+ type='ConcatDataset',
107
+ datasets=[
108
+ dict(
109
+ type='Structured3DDataset',
110
+ split=['train', 'val', 'test'],
111
+ data_root='data/structured3d',
112
+ transform=[
113
+ dict(type='CenterShift', apply_z=True),
114
+ dict(
115
+ type='RandomDropout',
116
+ dropout_ratio=0.2,
117
+ dropout_application_ratio=0.2),
118
+ dict(
119
+ type='RandomRotate',
120
+ angle=[-1, 1],
121
+ axis='z',
122
+ center=[0, 0, 0],
123
+ p=0.5),
124
+ dict(
125
+ type='RandomRotate',
126
+ angle=[-0.015625, 0.015625],
127
+ axis='x',
128
+ p=0.5),
129
+ dict(
130
+ type='RandomRotate',
131
+ angle=[-0.015625, 0.015625],
132
+ axis='y',
133
+ p=0.5),
134
+ dict(type='RandomScale', scale=[0.9, 1.1]),
135
+ dict(type='RandomFlip', p=0.5),
136
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
137
+ dict(
138
+ type='ElasticDistortion',
139
+ distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
140
+ dict(
141
+ type='ChromaticAutoContrast', p=0.2,
142
+ blend_factor=None),
143
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
144
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
145
+ dict(
146
+ type='GridSample',
147
+ grid_size=0.02,
148
+ hash_type='fnv',
149
+ mode='train',
150
+ return_grid_coord=True),
151
+ dict(type='SphereCrop', sample_rate=0.8, mode='random'),
152
+ dict(type='SphereCrop', point_max=102400, mode='random'),
153
+ dict(type='CenterShift', apply_z=False),
154
+ dict(type='NormalizeColor'),
155
+ dict(type='Add', keys_dict=dict(condition='Structured3D')),
156
+ dict(type='ToTensor'),
157
+ dict(
158
+ type='Collect',
159
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
160
+ feat_keys=('color', 'normal'))
161
+ ],
162
+ test_mode=False,
163
+ loop=2),
164
+ dict(
165
+ type='ScanNetDataset',
166
+ split='train',
167
+ data_root='data/scannet',
168
+ transform=[
169
+ dict(type='CenterShift', apply_z=True),
170
+ dict(
171
+ type='RandomDropout',
172
+ dropout_ratio=0.2,
173
+ dropout_application_ratio=0.2),
174
+ dict(
175
+ type='RandomRotate',
176
+ angle=[-1, 1],
177
+ axis='z',
178
+ center=[0, 0, 0],
179
+ p=0.5),
180
+ dict(
181
+ type='RandomRotate',
182
+ angle=[-0.015625, 0.015625],
183
+ axis='x',
184
+ p=0.5),
185
+ dict(
186
+ type='RandomRotate',
187
+ angle=[-0.015625, 0.015625],
188
+ axis='y',
189
+ p=0.5),
190
+ dict(type='RandomScale', scale=[0.9, 1.1]),
191
+ dict(type='RandomFlip', p=0.5),
192
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
193
+ dict(
194
+ type='ElasticDistortion',
195
+ distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
196
+ dict(
197
+ type='ChromaticAutoContrast', p=0.2,
198
+ blend_factor=None),
199
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
200
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
201
+ dict(
202
+ type='GridSample',
203
+ grid_size=0.02,
204
+ hash_type='fnv',
205
+ mode='train',
206
+ return_grid_coord=True),
207
+ dict(type='SphereCrop', point_max=102400, mode='random'),
208
+ dict(type='CenterShift', apply_z=False),
209
+ dict(type='NormalizeColor'),
210
+ dict(type='ShufflePoint'),
211
+ dict(type='Add', keys_dict=dict(condition='ScanNet')),
212
+ dict(type='ToTensor'),
213
+ dict(
214
+ type='Collect',
215
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
216
+ feat_keys=('color', 'normal'))
217
+ ],
218
+ test_mode=False,
219
+ loop=1)
220
+ ],
221
+ loop=1),
222
+ val=dict(
223
+ type='ScanNetDataset',
224
+ split='val',
225
+ data_root='data/scannet',
226
+ transform=[
227
+ dict(type='CenterShift', apply_z=True),
228
+ dict(
229
+ type='GridSample',
230
+ grid_size=0.02,
231
+ hash_type='fnv',
232
+ mode='train',
233
+ return_grid_coord=True),
234
+ dict(type='CenterShift', apply_z=False),
235
+ dict(type='NormalizeColor'),
236
+ dict(type='ToTensor'),
237
+ dict(type='Add', keys_dict=dict(condition='ScanNet')),
238
+ dict(
239
+ type='Collect',
240
+ keys=('coord', 'grid_coord', 'segment', 'condition'),
241
+ feat_keys=('color', 'normal'))
242
+ ],
243
+ test_mode=False),
244
+ test=dict(
245
+ type='ScanNetDataset',
246
+ split='val',
247
+ data_root='data/scannet',
248
+ transform=[
249
+ dict(type='CenterShift', apply_z=True),
250
+ dict(type='NormalizeColor')
251
+ ],
252
+ test_mode=True,
253
+ test_cfg=dict(
254
+ voxelize=dict(
255
+ type='GridSample',
256
+ grid_size=0.02,
257
+ hash_type='fnv',
258
+ mode='test',
259
+ keys=('coord', 'color', 'normal'),
260
+ return_grid_coord=True),
261
+ crop=None,
262
+ post_transform=[
263
+ dict(type='CenterShift', apply_z=False),
264
+ dict(type='Add', keys_dict=dict(condition='ScanNet')),
265
+ dict(type='ToTensor'),
266
+ dict(
267
+ type='Collect',
268
+ keys=('coord', 'grid_coord', 'index', 'condition'),
269
+ feat_keys=('color', 'normal'))
270
+ ],
271
+ aug_transform=[[{
272
+ 'type': 'RandomRotateTargetAngle',
273
+ 'angle': [0],
274
+ 'axis': 'z',
275
+ 'center': [0, 0, 0],
276
+ 'p': 1
277
+ }],
278
+ [{
279
+ 'type': 'RandomRotateTargetAngle',
280
+ 'angle': [0.5],
281
+ 'axis': 'z',
282
+ 'center': [0, 0, 0],
283
+ 'p': 1
284
+ }],
285
+ [{
286
+ 'type': 'RandomRotateTargetAngle',
287
+ 'angle': [1],
288
+ 'axis': 'z',
289
+ 'center': [0, 0, 0],
290
+ 'p': 1
291
+ }],
292
+ [{
293
+ 'type': 'RandomRotateTargetAngle',
294
+ 'angle': [1.5],
295
+ 'axis': 'z',
296
+ 'center': [0, 0, 0],
297
+ 'p': 1
298
+ }],
299
+ [{
300
+ 'type': 'RandomRotateTargetAngle',
301
+ 'angle': [0],
302
+ 'axis': 'z',
303
+ 'center': [0, 0, 0],
304
+ 'p': 1
305
+ }, {
306
+ 'type': 'RandomScale',
307
+ 'scale': [0.95, 0.95]
308
+ }],
309
+ [{
310
+ 'type': 'RandomRotateTargetAngle',
311
+ 'angle': [0.5],
312
+ 'axis': 'z',
313
+ 'center': [0, 0, 0],
314
+ 'p': 1
315
+ }, {
316
+ 'type': 'RandomScale',
317
+ 'scale': [0.95, 0.95]
318
+ }],
319
+ [{
320
+ 'type': 'RandomRotateTargetAngle',
321
+ 'angle': [1],
322
+ 'axis': 'z',
323
+ 'center': [0, 0, 0],
324
+ 'p': 1
325
+ }, {
326
+ 'type': 'RandomScale',
327
+ 'scale': [0.95, 0.95]
328
+ }],
329
+ [{
330
+ 'type': 'RandomRotateTargetAngle',
331
+ 'angle': [1.5],
332
+ 'axis': 'z',
333
+ 'center': [0, 0, 0],
334
+ 'p': 1
335
+ }, {
336
+ 'type': 'RandomScale',
337
+ 'scale': [0.95, 0.95]
338
+ }],
339
+ [{
340
+ 'type': 'RandomRotateTargetAngle',
341
+ 'angle': [0],
342
+ 'axis': 'z',
343
+ 'center': [0, 0, 0],
344
+ 'p': 1
345
+ }, {
346
+ 'type': 'RandomScale',
347
+ 'scale': [1.05, 1.05]
348
+ }],
349
+ [{
350
+ 'type': 'RandomRotateTargetAngle',
351
+ 'angle': [0.5],
352
+ 'axis': 'z',
353
+ 'center': [0, 0, 0],
354
+ 'p': 1
355
+ }, {
356
+ 'type': 'RandomScale',
357
+ 'scale': [1.05, 1.05]
358
+ }],
359
+ [{
360
+ 'type': 'RandomRotateTargetAngle',
361
+ 'angle': [1],
362
+ 'axis': 'z',
363
+ 'center': [0, 0, 0],
364
+ 'p': 1
365
+ }, {
366
+ 'type': 'RandomScale',
367
+ 'scale': [1.05, 1.05]
368
+ }],
369
+ [{
370
+ 'type': 'RandomRotateTargetAngle',
371
+ 'angle': [1.5],
372
+ 'axis': 'z',
373
+ 'center': [0, 0, 0],
374
+ 'p': 1
375
+ }, {
376
+ 'type': 'RandomScale',
377
+ 'scale': [1.05, 1.05]
378
+ }], [{
379
+ 'type': 'RandomFlip',
380
+ 'p': 1
381
+ }]])))
scannet-semseg-pt-v3m1-1-ppt-extreme/events.out.tfevents.1706979139.scannet-semseg-pt-v3m1-1-ppt-extreme ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4e134e4c23aa8dcdaaecd36af85046c3e8b80e7790487aba03e0c51665ba3cb
3
+ size 13080438
scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d964d10112a13a7fddadad16f37a940ed26e09b9e9799f78e8386cff8f2ae771
3
+ size 1170271282
scannet-semseg-pt-v3m1-1-ppt-extreme/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a53e368f93975164f9fbb95fe1e6d1d9efab663990ba4d88410d36768aef71bd
3
+ size 1170271282
scannet-semseg-pt-v3m1-1-ppt-extreme/test.log ADDED
The diff for this file is too large to render. See raw diff
 
scannet-semseg-pt-v3m1-1-ppt-extreme/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:050a681c6530ae807a1b4966b78bbebb101ea4f98b916979b380c4e933c6be79
3
+ size 22084602
scannet200-semseg-pt-v3m1-0-base/config.py ADDED
@@ -0,0 +1,375 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 1023306
6
+ save_path = 'exp/scannet200/semseg-pt-v3m1-0-base'
7
+ num_worker = 24
8
+ batch_size = 12
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 800
12
+ eval_epoch = 100
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = False
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0006)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='DefaultTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ CLASS_LABELS_200 = (
30
+ 'wall', 'chair', 'floor', 'table', 'door', 'couch', 'cabinet', 'shelf',
31
+ 'desk', 'office chair', 'bed', 'pillow', 'sink', 'picture', 'window',
32
+ 'toilet', 'bookshelf', 'monitor', 'curtain', 'book', 'armchair',
33
+ 'coffee table', 'box', 'refrigerator', 'lamp', 'kitchen cabinet', 'towel',
34
+ 'clothes', 'tv', 'nightstand', 'counter', 'dresser', 'stool', 'cushion',
35
+ 'plant', 'ceiling', 'bathtub', 'end table', 'dining table', 'keyboard',
36
+ 'bag', 'backpack', 'toilet paper', 'printer', 'tv stand', 'whiteboard',
37
+ 'blanket', 'shower curtain', 'trash can', 'closet', 'stairs', 'microwave',
38
+ 'stove', 'shoe', 'computer tower', 'bottle', 'bin', 'ottoman', 'bench',
39
+ 'board', 'washing machine', 'mirror', 'copier', 'basket', 'sofa chair',
40
+ 'file cabinet', 'fan', 'laptop', 'shower', 'paper', 'person',
41
+ 'paper towel dispenser', 'oven', 'blinds', 'rack', 'plate', 'blackboard',
42
+ 'piano', 'suitcase', 'rail', 'radiator', 'recycling bin', 'container',
43
+ 'wardrobe', 'soap dispenser', 'telephone', 'bucket', 'clock', 'stand',
44
+ 'light', 'laundry basket', 'pipe', 'clothes dryer', 'guitar',
45
+ 'toilet paper holder', 'seat', 'speaker', 'column', 'bicycle', 'ladder',
46
+ 'bathroom stall', 'shower wall', 'cup', 'jacket', 'storage bin',
47
+ 'coffee maker', 'dishwasher', 'paper towel roll', 'machine', 'mat',
48
+ 'windowsill', 'bar', 'toaster', 'bulletin board', 'ironing board',
49
+ 'fireplace', 'soap dish', 'kitchen counter', 'doorframe',
50
+ 'toilet paper dispenser', 'mini fridge', 'fire extinguisher', 'ball',
51
+ 'hat', 'shower curtain rod', 'water cooler', 'paper cutter', 'tray',
52
+ 'shower door', 'pillar', 'ledge', 'toaster oven', 'mouse',
53
+ 'toilet seat cover dispenser', 'furniture', 'cart', 'storage container',
54
+ 'scale', 'tissue box', 'light switch', 'crate', 'power outlet',
55
+ 'decoration', 'sign', 'projector', 'closet door', 'vacuum cleaner',
56
+ 'candle', 'plunger', 'stuffed animal', 'headphones', 'dish rack', 'broom',
57
+ 'guitar case', 'range hood', 'dustpan', 'hair dryer', 'water bottle',
58
+ 'handicap bar', 'purse', 'vent', 'shower floor', 'water pitcher',
59
+ 'mailbox', 'bowl', 'paper bag', 'alarm clock', 'music stand',
60
+ 'projector screen', 'divider', 'laundry detergent', 'bathroom counter',
61
+ 'object', 'bathroom vanity', 'closet wall', 'laundry hamper',
62
+ 'bathroom stall door', 'ceiling light', 'trash bin', 'dumbbell',
63
+ 'stair rail', 'tube', 'bathroom cabinet', 'cd case', 'closet rod',
64
+ 'coffee kettle', 'structure', 'shower head', 'keyboard piano',
65
+ 'case of water bottles', 'coat rack', 'storage organizer', 'folded chair',
66
+ 'fire alarm', 'power strip', 'calendar', 'poster', 'potted plant',
67
+ 'luggage', 'mattress')
68
+ model = dict(
69
+ type='DefaultSegmentorV2',
70
+ num_classes=200,
71
+ backbone_out_channels=64,
72
+ backbone=dict(
73
+ type='PT-v3m1',
74
+ in_channels=6,
75
+ order=['z', 'z-trans', 'hilbert', 'hilbert-trans'],
76
+ stride=(2, 2, 2, 2),
77
+ enc_depths=(2, 2, 2, 6, 2),
78
+ enc_channels=(32, 64, 128, 256, 512),
79
+ enc_num_head=(2, 4, 8, 16, 32),
80
+ enc_patch_size=(1024, 1024, 1024, 1024, 1024),
81
+ dec_depths=(2, 2, 2, 2),
82
+ dec_channels=(64, 64, 128, 256),
83
+ dec_num_head=(4, 4, 8, 16),
84
+ dec_patch_size=(1024, 1024, 1024, 1024),
85
+ mlp_ratio=4,
86
+ qkv_bias=True,
87
+ qk_scale=None,
88
+ attn_drop=0.0,
89
+ proj_drop=0.0,
90
+ drop_path=0.3,
91
+ shuffle_orders=True,
92
+ pre_norm=True,
93
+ enable_rpe=False,
94
+ enable_flash=True,
95
+ upcast_attention=False,
96
+ upcast_softmax=False,
97
+ cls_mode=False,
98
+ pdnorm_bn=False,
99
+ pdnorm_ln=False,
100
+ pdnorm_decouple=True,
101
+ pdnorm_adaptive=False,
102
+ pdnorm_affine=True,
103
+ pdnorm_conditions=('ScanNet', 'S3DIS', 'Structured3D')),
104
+ criteria=[
105
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
106
+ dict(
107
+ type='LovaszLoss',
108
+ mode='multiclass',
109
+ loss_weight=1.0,
110
+ ignore_index=-1)
111
+ ])
112
+ optimizer = dict(type='AdamW', lr=0.006, weight_decay=0.05)
113
+ scheduler = dict(
114
+ type='OneCycleLR',
115
+ max_lr=[0.006, 0.0006],
116
+ pct_start=0.05,
117
+ anneal_strategy='cos',
118
+ div_factor=10.0,
119
+ final_div_factor=1000.0)
120
+ dataset_type = 'ScanNet200Dataset'
121
+ data_root = 'data/scannet'
122
+ data = dict(
123
+ num_classes=200,
124
+ ignore_index=-1,
125
+ names=(
126
+ 'wall', 'chair', 'floor', 'table', 'door', 'couch', 'cabinet', 'shelf',
127
+ 'desk', 'office chair', 'bed', 'pillow', 'sink', 'picture', 'window',
128
+ 'toilet', 'bookshelf', 'monitor', 'curtain', 'book', 'armchair',
129
+ 'coffee table', 'box', 'refrigerator', 'lamp', 'kitchen cabinet',
130
+ 'towel', 'clothes', 'tv', 'nightstand', 'counter', 'dresser', 'stool',
131
+ 'cushion', 'plant', 'ceiling', 'bathtub', 'end table', 'dining table',
132
+ 'keyboard', 'bag', 'backpack', 'toilet paper', 'printer', 'tv stand',
133
+ 'whiteboard', 'blanket', 'shower curtain', 'trash can', 'closet',
134
+ 'stairs', 'microwave', 'stove', 'shoe', 'computer tower', 'bottle',
135
+ 'bin', 'ottoman', 'bench', 'board', 'washing machine', 'mirror',
136
+ 'copier', 'basket', 'sofa chair', 'file cabinet', 'fan', 'laptop',
137
+ 'shower', 'paper', 'person', 'paper towel dispenser', 'oven', 'blinds',
138
+ 'rack', 'plate', 'blackboard', 'piano', 'suitcase', 'rail', 'radiator',
139
+ 'recycling bin', 'container', 'wardrobe', 'soap dispenser',
140
+ 'telephone', 'bucket', 'clock', 'stand', 'light', 'laundry basket',
141
+ 'pipe', 'clothes dryer', 'guitar', 'toilet paper holder', 'seat',
142
+ 'speaker', 'column', 'bicycle', 'ladder', 'bathroom stall',
143
+ 'shower wall', 'cup', 'jacket', 'storage bin', 'coffee maker',
144
+ 'dishwasher', 'paper towel roll', 'machine', 'mat', 'windowsill',
145
+ 'bar', 'toaster', 'bulletin board', 'ironing board', 'fireplace',
146
+ 'soap dish', 'kitchen counter', 'doorframe', 'toilet paper dispenser',
147
+ 'mini fridge', 'fire extinguisher', 'ball', 'hat',
148
+ 'shower curtain rod', 'water cooler', 'paper cutter', 'tray',
149
+ 'shower door', 'pillar', 'ledge', 'toaster oven', 'mouse',
150
+ 'toilet seat cover dispenser', 'furniture', 'cart',
151
+ 'storage container', 'scale', 'tissue box', 'light switch', 'crate',
152
+ 'power outlet', 'decoration', 'sign', 'projector', 'closet door',
153
+ 'vacuum cleaner', 'candle', 'plunger', 'stuffed animal', 'headphones',
154
+ 'dish rack', 'broom', 'guitar case', 'range hood', 'dustpan',
155
+ 'hair dryer', 'water bottle', 'handicap bar', 'purse', 'vent',
156
+ 'shower floor', 'water pitcher', 'mailbox', 'bowl', 'paper bag',
157
+ 'alarm clock', 'music stand', 'projector screen', 'divider',
158
+ 'laundry detergent', 'bathroom counter', 'object', 'bathroom vanity',
159
+ 'closet wall', 'laundry hamper', 'bathroom stall door',
160
+ 'ceiling light', 'trash bin', 'dumbbell', 'stair rail', 'tube',
161
+ 'bathroom cabinet', 'cd case', 'closet rod', 'coffee kettle',
162
+ 'structure', 'shower head', 'keyboard piano', 'case of water bottles',
163
+ 'coat rack', 'storage organizer', 'folded chair', 'fire alarm',
164
+ 'power strip', 'calendar', 'poster', 'potted plant', 'luggage',
165
+ 'mattress'),
166
+ train=dict(
167
+ type='ScanNet200Dataset',
168
+ split='train',
169
+ data_root='data/scannet',
170
+ transform=[
171
+ dict(type='CenterShift', apply_z=True),
172
+ dict(
173
+ type='RandomDropout',
174
+ dropout_ratio=0.2,
175
+ dropout_application_ratio=0.2),
176
+ dict(
177
+ type='RandomRotate',
178
+ angle=[-1, 1],
179
+ axis='z',
180
+ center=[0, 0, 0],
181
+ p=0.5),
182
+ dict(
183
+ type='RandomRotate',
184
+ angle=[-0.015625, 0.015625],
185
+ axis='x',
186
+ p=0.5),
187
+ dict(
188
+ type='RandomRotate',
189
+ angle=[-0.015625, 0.015625],
190
+ axis='y',
191
+ p=0.5),
192
+ dict(type='RandomScale', scale=[0.9, 1.1]),
193
+ dict(type='RandomFlip', p=0.5),
194
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
195
+ dict(
196
+ type='ElasticDistortion',
197
+ distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
198
+ dict(type='ChromaticAutoContrast', p=0.2, blend_factor=None),
199
+ dict(type='ChromaticTranslation', p=0.95, ratio=0.05),
200
+ dict(type='ChromaticJitter', p=0.95, std=0.05),
201
+ dict(
202
+ type='GridSample',
203
+ grid_size=0.02,
204
+ hash_type='fnv',
205
+ mode='train',
206
+ return_grid_coord=True),
207
+ dict(type='SphereCrop', point_max=102400, mode='random'),
208
+ dict(type='CenterShift', apply_z=False),
209
+ dict(type='NormalizeColor'),
210
+ dict(type='ToTensor'),
211
+ dict(
212
+ type='Collect',
213
+ keys=('coord', 'grid_coord', 'segment'),
214
+ feat_keys=('color', 'normal'))
215
+ ],
216
+ test_mode=False,
217
+ loop=8),
218
+ val=dict(
219
+ type='ScanNet200Dataset',
220
+ split='val',
221
+ data_root='data/scannet',
222
+ transform=[
223
+ dict(type='CenterShift', apply_z=True),
224
+ dict(
225
+ type='GridSample',
226
+ grid_size=0.02,
227
+ hash_type='fnv',
228
+ mode='train',
229
+ return_grid_coord=True),
230
+ dict(type='CenterShift', apply_z=False),
231
+ dict(type='NormalizeColor'),
232
+ dict(type='ToTensor'),
233
+ dict(
234
+ type='Collect',
235
+ keys=('coord', 'grid_coord', 'segment'),
236
+ feat_keys=('color', 'normal'))
237
+ ],
238
+ test_mode=False),
239
+ test=dict(
240
+ type='ScanNet200Dataset',
241
+ split='val',
242
+ data_root='data/scannet',
243
+ transform=[
244
+ dict(type='CenterShift', apply_z=True),
245
+ dict(type='NormalizeColor')
246
+ ],
247
+ test_mode=True,
248
+ test_cfg=dict(
249
+ voxelize=dict(
250
+ type='GridSample',
251
+ grid_size=0.02,
252
+ hash_type='fnv',
253
+ mode='test',
254
+ keys=('coord', 'color', 'normal'),
255
+ return_grid_coord=True),
256
+ crop=None,
257
+ post_transform=[
258
+ dict(type='CenterShift', apply_z=False),
259
+ dict(type='ToTensor'),
260
+ dict(
261
+ type='Collect',
262
+ keys=('coord', 'grid_coord', 'index'),
263
+ feat_keys=('color', 'normal'))
264
+ ],
265
+ aug_transform=[[{
266
+ 'type': 'RandomRotateTargetAngle',
267
+ 'angle': [0],
268
+ 'axis': 'z',
269
+ 'center': [0, 0, 0],
270
+ 'p': 1
271
+ }],
272
+ [{
273
+ 'type': 'RandomRotateTargetAngle',
274
+ 'angle': [0.5],
275
+ 'axis': 'z',
276
+ 'center': [0, 0, 0],
277
+ 'p': 1
278
+ }],
279
+ [{
280
+ 'type': 'RandomRotateTargetAngle',
281
+ 'angle': [1],
282
+ 'axis': 'z',
283
+ 'center': [0, 0, 0],
284
+ 'p': 1
285
+ }],
286
+ [{
287
+ 'type': 'RandomRotateTargetAngle',
288
+ 'angle': [1.5],
289
+ 'axis': 'z',
290
+ 'center': [0, 0, 0],
291
+ 'p': 1
292
+ }],
293
+ [{
294
+ 'type': 'RandomRotateTargetAngle',
295
+ 'angle': [0],
296
+ 'axis': 'z',
297
+ 'center': [0, 0, 0],
298
+ 'p': 1
299
+ }, {
300
+ 'type': 'RandomScale',
301
+ 'scale': [0.95, 0.95]
302
+ }],
303
+ [{
304
+ 'type': 'RandomRotateTargetAngle',
305
+ 'angle': [0.5],
306
+ 'axis': 'z',
307
+ 'center': [0, 0, 0],
308
+ 'p': 1
309
+ }, {
310
+ 'type': 'RandomScale',
311
+ 'scale': [0.95, 0.95]
312
+ }],
313
+ [{
314
+ 'type': 'RandomRotateTargetAngle',
315
+ 'angle': [1],
316
+ 'axis': 'z',
317
+ 'center': [0, 0, 0],
318
+ 'p': 1
319
+ }, {
320
+ 'type': 'RandomScale',
321
+ 'scale': [0.95, 0.95]
322
+ }],
323
+ [{
324
+ 'type': 'RandomRotateTargetAngle',
325
+ 'angle': [1.5],
326
+ 'axis': 'z',
327
+ 'center': [0, 0, 0],
328
+ 'p': 1
329
+ }, {
330
+ 'type': 'RandomScale',
331
+ 'scale': [0.95, 0.95]
332
+ }],
333
+ [{
334
+ 'type': 'RandomRotateTargetAngle',
335
+ 'angle': [0],
336
+ 'axis': 'z',
337
+ 'center': [0, 0, 0],
338
+ 'p': 1
339
+ }, {
340
+ 'type': 'RandomScale',
341
+ 'scale': [1.05, 1.05]
342
+ }],
343
+ [{
344
+ 'type': 'RandomRotateTargetAngle',
345
+ 'angle': [0.5],
346
+ 'axis': 'z',
347
+ 'center': [0, 0, 0],
348
+ 'p': 1
349
+ }, {
350
+ 'type': 'RandomScale',
351
+ 'scale': [1.05, 1.05]
352
+ }],
353
+ [{
354
+ 'type': 'RandomRotateTargetAngle',
355
+ 'angle': [1],
356
+ 'axis': 'z',
357
+ 'center': [0, 0, 0],
358
+ 'p': 1
359
+ }, {
360
+ 'type': 'RandomScale',
361
+ 'scale': [1.05, 1.05]
362
+ }],
363
+ [{
364
+ 'type': 'RandomRotateTargetAngle',
365
+ 'angle': [1.5],
366
+ 'axis': 'z',
367
+ 'center': [0, 0, 0],
368
+ 'p': 1
369
+ }, {
370
+ 'type': 'RandomScale',
371
+ 'scale': [1.05, 1.05]
372
+ }], [{
373
+ 'type': 'RandomFlip',
374
+ 'p': 1
375
+ }]])))
scannet200-semseg-pt-v3m1-0-base/events.out.tfevents.1703049688.scannet200-semseg-pt-v3m1-0-base ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:698d33e2e356f7bd59b996e01a2253f3c34b5cbfaf96ea62adf0ddcee4269341
3
+ size 7830420
scannet200-semseg-pt-v3m1-0-base/model/model_best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:426355a950a4c86abdd191a464bc3598241a23601fde17b42f08d536a08fab88
3
+ size 554758440
scannet200-semseg-pt-v3m1-0-base/model/model_last.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:212755e4debd8fe1b04c8acb64c353eb2450346b23b500b8cba1c471c14e737a
3
+ size 554758440
scannet200-semseg-pt-v3m1-0-base/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bda295e03c9d08880bb35475e91434c92973e1d447866f490e19e63bbbf5fb7
3
+ size 16866313
waymo-semseg-pt-v3m1-0-base/config.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ weight = None
2
+ resume = False
3
+ evaluate = True
4
+ test_only = False
5
+ seed = 2311533
6
+ save_path = 'exp/waymo/semseg-pt-v3m1-0-base'
7
+ num_worker = 16
8
+ batch_size = 12
9
+ batch_size_val = None
10
+ batch_size_test = None
11
+ epoch = 50
12
+ eval_epoch = 50
13
+ sync_bn = False
14
+ enable_amp = True
15
+ empty_cache = False
16
+ find_unused_parameters = False
17
+ mix_prob = 0.8
18
+ param_dicts = [dict(keyword='block', lr=0.0002)]
19
+ hooks = [
20
+ dict(type='CheckpointLoader'),
21
+ dict(type='IterationTimer', warmup_iter=2),
22
+ dict(type='InformationWriter'),
23
+ dict(type='SemSegEvaluator'),
24
+ dict(type='CheckpointSaver', save_freq=None),
25
+ dict(type='PreciseEvaluator', test_last=False)
26
+ ]
27
+ train = dict(type='DefaultTrainer')
28
+ test = dict(type='SemSegTester', verbose=True)
29
+ model = dict(
30
+ type='DefaultSegmentorV2',
31
+ num_classes=22,
32
+ backbone_out_channels=64,
33
+ backbone=dict(
34
+ type='PT-v3m1',
35
+ in_channels=4,
36
+ order=['z', 'z-trans', 'hilbert', 'hilbert-trans'],
37
+ stride=(2, 2, 2, 2),
38
+ enc_depths=(2, 2, 2, 6, 2),
39
+ enc_channels=(32, 64, 128, 256, 512),
40
+ enc_num_head=(2, 4, 8, 16, 32),
41
+ enc_patch_size=(1024, 1024, 1024, 1024, 1024),
42
+ dec_depths=(2, 2, 2, 2),
43
+ dec_channels=(64, 64, 128, 256),
44
+ dec_num_head=(4, 4, 8, 16),
45
+ dec_patch_size=(1024, 1024, 1024, 1024),
46
+ mlp_ratio=4,
47
+ qkv_bias=True,
48
+ qk_scale=None,
49
+ attn_drop=0.0,
50
+ proj_drop=0.0,
51
+ drop_path=0.3,
52
+ shuffle_orders=True,
53
+ pre_norm=True,
54
+ enable_rpe=False,
55
+ enable_flash=True,
56
+ upcast_attention=False,
57
+ upcast_softmax=False,
58
+ cls_mode=False,
59
+ pdnorm_bn=False,
60
+ pdnorm_ln=False,
61
+ pdnorm_decouple=True,
62
+ pdnorm_adaptive=False,
63
+ pdnorm_affine=True,
64
+ pdnorm_conditions=('nuScenes', 'SemanticKITTI', 'Waymo')),
65
+ criteria=[
66
+ dict(type='CrossEntropyLoss', loss_weight=1.0, ignore_index=-1),
67
+ dict(
68
+ type='LovaszLoss',
69
+ mode='multiclass',
70
+ loss_weight=1.0,
71
+ ignore_index=-1)
72
+ ])
73
+ optimizer = dict(type='AdamW', lr=0.002, weight_decay=0.005)
74
+ scheduler = dict(
75
+ type='OneCycleLR',
76
+ max_lr=[0.002, 0.0002],
77
+ pct_start=0.04,
78
+ anneal_strategy='cos',
79
+ div_factor=10.0,
80
+ final_div_factor=100.0)
81
+ dataset_type = 'WaymoDataset'
82
+ data_root = 'data/waymo'
83
+ ignore_index = -1
84
+ names = [
85
+ 'Car', 'Truck', 'Bus', 'Other Vehicle', 'Motorcyclist', 'Bicyclist',
86
+ 'Pedestrian', 'Sign', 'Traffic Light', 'Pole', 'Construction Cone',
87
+ 'Bicycle', 'Motorcycle', 'Building', 'Vegetation', 'Tree Trunk', 'Curb',
88
+ 'Road', 'Lane Marker', 'Other Ground', 'Walkable', 'Sidewalk'
89
+ ]
90
+ data = dict(
91
+ num_classes=22,
92
+ ignore_index=-1,
93
+ names=[
94
+ 'Car', 'Truck', 'Bus', 'Other Vehicle', 'Motorcyclist', 'Bicyclist',
95
+ 'Pedestrian', 'Sign', 'Traffic Light', 'Pole', 'Construction Cone',
96
+ 'Bicycle', 'Motorcycle', 'Building', 'Vegetation', 'Tree Trunk',
97
+ 'Curb', 'Road', 'Lane Marker', 'Other Ground', 'Walkable', 'Sidewalk'
98
+ ],
99
+ train=dict(
100
+ type='WaymoDataset',
101
+ split='training',
102
+ data_root='data/waymo',
103
+ transform=[
104
+ dict(
105
+ type='RandomRotate',
106
+ angle=[-1, 1],
107
+ axis='z',
108
+ center=[0, 0, 0],
109
+ p=0.5),
110
+ dict(
111
+ type='PointClip',
112
+ point_cloud_range=(-75.2, -75.2, -4, 75.2, 75.2, 2)),
113
+ dict(type='RandomScale', scale=[0.9, 1.1]),
114
+ dict(type='RandomFlip', p=0.5),
115
+ dict(type='RandomJitter', sigma=0.005, clip=0.02),
116
+ dict(
117
+ type='GridSample',
118
+ grid_size=0.05,
119
+ hash_type='fnv',
120
+ mode='train',
121
+ keys=('coord', 'strength', 'segment'),
122
+ return_grid_coord=True),
123
+ dict(type='ToTensor'),
124
+ dict(
125
+ type='Collect',
126
+ keys=('coord', 'grid_coord', 'segment'),
127
+ feat_keys=('coord', 'strength'))
128
+ ],
129
+ test_mode=False,
130
+ ignore_index=-1,
131
+ loop=1),
132
+ val=dict(
133
+ type='WaymoDataset',
134
+ split='validation',
135
+ data_root='data/waymo',
136
+ transform=[
137
+ dict(
138
+ type='PointClip',
139
+ point_cloud_range=(-75.2, -75.2, -4, 75.2, 75.2, 2)),
140
+ dict(
141
+ type='GridSample',
142
+ grid_size=0.05,
143
+ hash_type='fnv',
144
+ mode='train',
145
+ keys=('coord', 'strength', 'segment'),
146
+ return_grid_coord=True),
147
+ dict(type='ToTensor'),
148
+ dict(
149
+ type='Collect',
150
+ keys=('coord', 'grid_coord', 'segment'),
151
+ feat_keys=('coord', 'strength'))
152
+ ],
153
+ test_mode=False,
154
+ ignore_index=-1),
155
+ test=dict(
156
+ type='WaymoDataset',
157
+ split='validation',
158
+ data_root='data/waymo',
159
+ transform=[
160
+ dict(
161
+ type='PointClip',
162
+ point_cloud_range=(-75.2, -75.2, -4, 75.2, 75.2, 2)),
163
+ dict(type='Copy', keys_dict=dict(segment='origin_segment')),
164
+ dict(
165
+ type='GridSample',
166
+ grid_size=0.025,
167
+ hash_type='fnv',
168
+ mode='train',
169
+ keys=('coord', 'strength', 'segment'),
170
+ return_inverse=True)
171
+ ],
172
+ test_mode=True,
173
+ test_cfg=dict(
174
+ voxelize=dict(
175
+ type='GridSample',
176
+ grid_size=0.05,
177
+ hash_type='fnv',
178
+ mode='test',
179
+ return_grid_coord=True,
180
+ keys=('coord', 'strength')),
181
+ crop=None,
182
+ post_transform=[
183
+ dict(type='ToTensor'),
184
+ dict(
185
+ type='Collect',
186
+ keys=('coord', 'grid_coord', 'index'),
187
+ feat_keys=('coord', 'strength'))
188
+ ],
189
+ aug_transform=[[{
190
+ 'type': 'RandomRotateTargetAngle',
191
+ 'angle': [0],
192
+ 'axis': 'z',
193
+ 'center': [0, 0, 0],
194
+ 'p': 1
195
+ }],
196
+ [{
197
+ 'type': 'RandomRotateTargetAngle',
198
+ 'angle': [0.5],
199
+ 'axis': 'z',
200
+ 'center': [0, 0, 0],
201
+ 'p': 1
202
+ }],
203
+ [{
204
+ 'type': 'RandomRotateTargetAngle',
205
+ 'angle': [1],
206
+ 'axis': 'z',
207
+ 'center': [0, 0, 0],
208
+ 'p': 1
209
+ }],
210
+ [{
211
+ 'type': 'RandomRotateTargetAngle',
212
+ 'angle': [1.5],
213
+ 'axis': 'z',
214
+ 'center': [0, 0, 0],
215
+ 'p': 1
216
+ }]]),
217
+ ignore_index=-1))
waymo-semseg-pt-v3m1-0-base/events.out.tfevents.1708353865.waymo-semseg-pt-v3m1-0-base ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2daba840e4090c8bf1ca66cacce791e1a1550d083e726239320f18daac44adf8
3
+ size 9651320
waymo-semseg-pt-v3m1-0-base/train.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:324e68325525a91d5075ae5a67554276f4454dbade0ea1f8e5d032207fbc68f8
3
+ size 29365755