diff --git a/README.md b/README.md
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..4f65397694faa1ab3a13b9ab8e5b740cfa92b5ca 100644
--- a/README.md
+++ b/README.md
@@ -0,0 +1,108 @@
+
+
+## Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+### News
+
+* **30 May 2024**: [Open-YOLO 3D](https://arxiv.org/abs/2406.02548) released on arXiv. 📝
+* **30 May 2024**: Code released. 💻
+
+### Abstract
+
+ Recent works on open-vocabulary 3D instance segmentation show strong promise, but at the cost of slow inference speed and high computation requirements. This high computation cost is typically due to their heavy reliance on 3D clip features, which require computationally expensive 2D foundation models like Segment Anything (SAM) and CLIP for multi-view aggregation into 3D. As a consequence, this hampers their applicability in many real-world applications that require both fast and accurate predictions. To this end, we propose a fast yet accurate open-vocabulary 3D instance segmentation approach, named Open-YOLO 3D, that effectively leverages only 2D object detection from multi-view RGB images for open-vocabulary 3D instance segmentation.
+ We address this task by generating class-agnostic 3D masks for objects in the scene and associating them with text prompts.
+ We observe that the projection of class-agnostic 3D point cloud instances already holds instance information; thus, using SAM might only result in redundancy that unnecessarily increases the inference time.
+We empirically find that a better performance of matching text prompts to 3D masks can be achieved in a faster fashion with a 2D object detector. We validate our Open-YOLO 3D on two benchmarks, ScanNet200 and Replica,
+ under two scenarios: (i) with ground truth masks, where labels are required for given object proposals, and (ii) with class-agnostic 3D proposals generated from a 3D proposal network. Our Open-YOLO 3D achieves state-of-the-art performance on both datasets while obtaining up to 16x speedup compared to the best existing method in literature. On ScanNet200 val. set, our Open-YOLO 3D achieves mean average precision (mAP) of 24.7% while operating at 22 seconds per scene.
+
+### Qualitative results
+
+
+
+
+
+
+
+## Installation guide
+
+Kindly check [Installation guide](./docs/Installation.md) on how to setup the Conda environment and to download the checkpoints, the pre-computed class agnostic masks, and the ground truth masks.
+
+## Data Preparation
+
+Kindly check [Data Preparation guide](./docs/Data_prep.md) on how to prepare ScanNet200 and Replica datasets.
+
+## Results reproducibility
+
+Kindly use the pre-computed class agnostic masks we shared to reproduce the exact numbers we reported in the paper.
+
+**Reproduce the results of ScanNet200 with precomputed-masks (using Mask3D)**
+```
+python run_evaluation.py --dataset_name scannet200 --path_to_3d_masks "./output/scannet200/scannet200_masks"
+```
+**Reproduce the results of ScanNet200 with oracle 3D masks (ground truth 3D masks)**
+```
+python run_evaluation.py --dataset_name scannet200 --path_to_3d_masks "./output/scannet200/scannet200_ground_truth_masks" --is_gt
+```
+**Reproduce the results of Replica with precomputed-masks (using Mask3D)**
+```
+python run_evaluation.py --dataset_name replica --path_to_3d_masks "./output/replica/replica_masks"
+```
+**Reproduce the results of Replica with oracle 3D masks (ground truth 3D masks)**
+```
+python run_evaluation.py --dataset_name replica --path_to_3d_masks "./output/replica/replica_ground_truth_masks" --is_gt
+```
+
+You can evaluate without our 3D class-agnostic masks, but this may lead to variability in results due to elements like furthest point sampling that cause randomness in predictions from Mask3D. For consistent results with the ones we report in the paper, we recommend using our pre-computed masks.
+
+**Reproduce the results of Replica or ScanNet200 without using our pre-computed masks**
+```
+python run_evaluation.py --dataset_name $DATASET_NAME
+```
+
+## Single scene inference
+
+```
+from utils import OpenYolo3D
+
+openyolo3d = OpenYolo3D("$(pwd)/pretrained/config.yaml") #Initialize the model, define the text prompts in the config.
+prediction = openyolo3d.predict("$(pwd)/data/replica/office0", 6553.5) #Predict the instance masks and labels (takes around 20 seconds in total).
+openyolo3d.save_output_as_ply("$(pwd)/sample/output.ply", True) # Save the ply file for visualization, you can use meshlab to visualize the output scene
+```
+
+## Acknoledgments
+We would like to thank the authors of Mask3D and YoloWorld for their works which were used for our model.
+
+
+## BibTeX :pray:
+```
+@misc{boudjoghra2024openyolo,
+ title={Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation},
+ author={Mohamed El Amine Boudjoghra and Angela Dai and Jean Lahoud and Hisham Cholakkal and Rao Muhammad Anwer and Salman Khan and Fahad Shahbaz Khan},
+ year={2024},
+ eprint={2406.02548},
+ archivePrefix={arXiv},
+ primaryClass={cs.CV}
+}
+```
+
diff --git a/environment.yml b/environment.yml
new file mode 100644
index 0000000000000000000000000000000000000000..1ce639c16688b6ced1b83624da4a391a43ccf30e
--- /dev/null
+++ b/environment.yml
@@ -0,0 +1,216 @@
+name: openyolo3d
+channels:
+ - anaconda
+ - defaults
+dependencies:
+ - _libgcc_mutex=0.1=main
+ - _openmp_mutex=5.1=1_gnu
+ - blas=1.0=openblas
+ - boltons=23.0.0=py310h06a4308_0
+ - brotlipy=0.7.0=py310h7f8727e_1002
+ - bzip2=1.0.8=h7b6447c_0
+ - ca-certificates=2023.01.10=h06a4308_0
+ - certifi=2022.12.7=py310h06a4308_0
+ - cffi=1.15.1=py310h5eee18b_3
+ - charset-normalizer=2.0.4=pyhd3eb1b0_0
+ - conda=23.3.1=py310h06a4308_0
+ - conda-content-trust=0.1.3=py310h06a4308_0
+ - conda-package-handling=2.0.2=py310h06a4308_0
+ - conda-package-streaming=0.7.0=py310h06a4308_0
+ - cryptography=39.0.1=py310h9ce1e76_0
+ - idna=3.4=py310h06a4308_0
+ - jsonpatch=1.32=pyhd3eb1b0_0
+ - jsonpointer=2.1=pyhd3eb1b0_0
+ - ld_impl_linux-64=2.38=h1181459_1
+ - libffi=3.4.2=h6a678d5_6
+ - libgcc-ng=11.2.0=h1234567_1
+ - libgfortran-ng=11.2.0=h00389a5_1
+ - libgfortran5=11.2.0=h1234567_1
+ - libgomp=11.2.0=h1234567_1
+ - libopenblas=0.3.21=h043d6bf_0
+ - libstdcxx-ng=11.2.0=h1234567_1
+ - libuuid=1.41.5=h5eee18b_0
+ - ncurses=6.4=h6a678d5_0
+ - nomkl=3.0=0
+ - openblas-devel=0.3.21=h06a4308_0
+ - openssl=1.1.1s=h7f8727e_0
+ - packaging=23.0=py310h06a4308_0
+ - pluggy=1.0.0=py310h06a4308_1
+ - pycosat=0.6.4=py310h5eee18b_0
+ - pycparser=2.21=pyhd3eb1b0_0
+ - pyopenssl=23.0.0=py310h06a4308_0
+ - pysocks=1.7.1=py310h06a4308_0
+ - python=3.10.9=h7a1cb2a_0
+ - readline=8.2=h5eee18b_0
+ - requests=2.28.1=py310h06a4308_1
+ - ruamel.yaml=0.17.21=py310h5eee18b_0
+ - ruamel.yaml.clib=0.2.6=py310h5eee18b_1
+ - setuptools=65.6.3=py310h06a4308_0
+ - six=1.16.0=pyhd3eb1b0_1
+ - sqlite=3.41.2=h5eee18b_0
+ - tk=8.6.12=h1ccaba5_0
+ - toolz=0.12.0=py310h06a4308_0
+ - tqdm=4.65.0=py310h2f386ee_0
+ - urllib3=1.26.15=py310h06a4308_0
+ - wheel=0.37.1=pyhd3eb1b0_0
+ - xz=5.2.10=h5eee18b_1
+ - zlib=1.2.13=h5eee18b_0
+ - zstandard=0.19.0=py310h5eee18b_0
+ - pip
+ - pip:
+ - absl-py==1.4.0
+ - addict==2.4.0
+ - aiohttp==3.8.4
+ - aiosignal==1.3.1
+ # - albumentations==1.2.1 #manual
+ - antlr4-python3-runtime==4.8
+ - anyio==3.6.2
+ - appdirs==1.4.4
+ - asttokens==2.2.1
+ - async-timeout==4.0.2
+ - attrs==23.1.0
+ - backcall==0.2.0
+ - black==21.4b2
+ - cachetools==5.3.0
+ - click==8.1.3
+ - cloudpickle==2.1.0
+ - comm==0.1.3
+ - configargparse==1.5.3
+ - contourpy==1.0.7
+ - cycler==0.11.0
+ - dash==2.9.3
+ - dash-core-components==2.0.0
+ - dash-html-components==2.0.0
+ - dash-table==5.0.0
+ - debugpy==1.6.7
+ - decorator==5.1.1
+ # - detectron2==0.6
+ - docker-pycreds==0.4.0
+ - executing==1.2.0
+ - fastapi==0.95.1
+ - fastjsonschema==2.16.3
+ - fire==0.4.0
+ - flake8==6.0.0
+ - flask==2.2.3
+ - fonttools==4.39.3
+ - frozenlist==1.3.3
+ - fsspec==2023.4.0
+ # - fvcore==0.1.5.post20220512 #manual
+ - gitdb==4.0.10
+ - gitpython==3.1.31
+ - google-auth==2.17.3
+ - google-auth-oauthlib==1.0.0
+ - grpcio==1.54.0
+ - h11==0.14.0
+ - hydra-core==1.0.5
+ - imageio==2.21.1
+ - importlib-metadata==3.10.1
+ - iopath==0.1.10
+ - ipykernel==6.22.0
+ - ipython==8.12.0
+ - ipywidgets==8.0.6
+ - itsdangerous==2.1.2
+ - jedi==0.18.2
+ - jinja2==3.1.2
+ - joblib==1.2.0
+ - jsonschema==4.17.3
+ - jupyter-client==8.2.0
+ - jupyter-core==5.3.0
+ - jupyterlab-widgets==3.0.7
+ - kiwisolver==1.4.4
+ - lazy-loader==0.2
+ - loguru==0.6.0
+ - markdown==3.4.3
+ - markupsafe==2.1.2
+ - matplotlib==3.7.1
+ - matplotlib-inline==0.1.6
+ # - minkowskiengine==0.5.4
+ - multidict==6.0.4
+ - mypy-extensions==1.0.0
+ - natsort==8.3.1
+ - nbformat==5.7.0
+ - nest-asyncio==1.5.6
+ - networkx==3.1
+ - ninja==1.10.2.3
+ - numpy==1.24.2
+ - oauthlib==3.2.2
+ # - omegaconf==2.0.6 #manual
+ # - open3d==0.17.0 #manual
+ - opencv-python-headless==4.7.0.72
+ - pandas==2.0.0
+ - parso==0.8.3
+ - pathspec==0.11.1
+ - pathtools==0.1.2
+ - pexpect==4.8.0
+ - pickleshare==0.7.5
+ - pillow==9.5.0
+ - pip==23.1
+ - platformdirs==3.2.0
+ - plotly==5.14.1
+ - plyfile==0.7.4
+ # - pointnet2==0.0.0
+ - portalocker==2.7.0
+ - prompt-toolkit==3.0.38
+ - protobuf==4.22.3
+ - psutil==5.9.5
+ - ptyprocess==0.7.0
+ - pure-eval==0.2.2
+ - pyasn1==0.5.0
+ - pyasn1-modules==0.3.0
+ - pycocotools==2.0.4
+ - pydantic==1.10.7
+ - pydeprecate==0.3.2
+ - pygments==2.15.1
+ - pyparsing==3.0.9
+ - pyquaternion==0.9.9
+ - pyrsistent==0.19.3
+ - python-dateutil==2.8.2
+ - python-dotenv==0.20.0
+ - python-multipart==0.0.6
+ # - pytorch-lightning==1.7.2
+ - pytz==2023.3
+ - pyviz3d==0.2.28
+ - pywavelets==1.4.1
+ - pyyaml==5.3.1
+ - pyzmq==25.0.2
+ - qudida==0.0.4
+ - regex==2023.3.23
+ - requests-oauthlib==1.3.1
+ - rsa==4.9
+ - scikit-image==0.20.0
+ - scikit-learn==1.1.2
+ - scipy==1.9.0
+ - sentry-sdk==1.20.0
+ - setproctitle==1.3.2
+ - smmap==5.0.0
+ - sniffio==1.3.0
+ - stack-data==0.6.2
+ - starlette==0.26.1
+ - tabulate==0.9.0
+ - tenacity==8.2.2
+ - tensorboard==2.12.2
+ - tensorboard-data-server==0.7.0
+ - tensorboard-plugin-wit==1.8.1
+ - termcolor==2.2.0
+ - threadpoolctl==3.1.0
+ - tifffile==2023.4.12
+ - toml==0.10.2
+ # - torch==1.12.1+cu113
+ # - torch-scatter==2.1.1
+ # - torchmetrics==0.11.4
+ # - torchvision==0.13.1+cu113
+ - tornado==6.3
+ - traitlets==5.9.0
+ - trimesh==3.14.0
+ - typing-extensions==4.5.0
+ - tzdata==2023.3
+ - uvicorn==0.21.1
+ - volumentations==0.1.8
+ - wandb==0.15.0
+ - wcwidth==0.2.6
+ - werkzeug==2.2.3
+ - widgetsnbextension==4.0.7
+ - yacs==0.1.8
+ - yarl==1.8.2
+ - zipp==3.15.0
+prefix: /opt/conda
diff --git a/models/Mask3D/LICENSE b/models/Mask3D/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..e619d905e048f45390e27e6fc2d93b6e96f1ea3b
--- /dev/null
+++ b/models/Mask3D/LICENSE
@@ -0,0 +1,22 @@
+MIT License
+
+Copyright (c) 2022
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/models/Mask3D/MANIFEST.in b/models/Mask3D/MANIFEST.in
new file mode 100644
index 0000000000000000000000000000000000000000..9ead0b59b546d425aeac6e46dba4278ef87eb3a7
--- /dev/null
+++ b/models/Mask3D/MANIFEST.in
@@ -0,0 +1 @@
+recursive-include mask3d/conf *.yaml
\ No newline at end of file
diff --git a/models/Mask3D/README.md b/models/Mask3D/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e02d4a639970a937a899274858682e85b2a33de8
--- /dev/null
+++ b/models/Mask3D/README.md
@@ -0,0 +1,289 @@
+# Packaged version of Mask3D to be used in LabelMaker
+
+## Installation
+
+```
+# Some users experienced issues on Ubuntu with an AMD CPU
+# Install libopenblas-dev (issue #115, thanks WindWing)
+# sudo apt-get install libopenblas-dev
+
+export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"
+
+conda env create -f environment.yml
+
+conda activate mask3d_cuda113
+
+pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
+pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
+pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
+
+mkdir third_party
+cd third_party
+
+git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
+cd MinkowskiEngine
+git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
+python setup.py install --force_cuda --blas=openblas
+
+cd ..
+git clone https://github.com/ScanNet/ScanNet.git
+cd ScanNet/Segmentator
+git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
+make
+
+cd third_party/pointnet2
+python setup.py install
+
+cd ../../
+pip3 install pytorch-lightning==1.7.2
+
+pip install .
+
+```
+
+To use the model in your code you need to download a checkpoint from the list below.
+Afterwards, the basic model can be used like:
+
+
+```python
+from mask3d import get_model
+
+model = get_model(checkpoint_path='checkpoints/scannet200/scannet200_benchmark.ckpt')
+```
+
+
+Here is a minimal example assuming you have a pointcloud in the folder data.
+
+```python
+
+from mask3d import get_model, load_mesh, prepare_data, map_output_to_pointcloud, save_colorized_mesh
+
+model = get_model('checkpoints/scannet200/scannet200_benchmark.ckpt')
+model.eval()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+
+# load input data
+pointcloud_file = 'data/pcl.ply'
+mesh = load_mesh(pointcloud_file)
+
+# prepare data
+data, points, colors, features, unique_map, inverse_map = prepare_data(mesh, device)
+
+# run model
+with torch.no_grad():
+ outputs = model(data, raw_coordinates=features)
+
+# map output to point cloud
+labels = map_output_to_pointcloud(mesh, outputs, inverse_map)
+
+# save colorized mesh
+save_colorized_mesh(mesh, labels, 'data/pcl_labelled.ply', colormap='scannet200')
+```
+
+So far, only Scannet200 checkpoints are supported. We are working on the ScanNet checkpoints.
+
+# Original Information
+
+## Mask3D: Mask Transformer for 3D Instance Segmentation
+
+
Jonas Schult1,
Francis Engelmann2,3,
Alexander Hermans1,
Or Litany4,
Siyu Tang3,
Bastian Leibe1
+
+
1RWTH Aachen University
2ETH AI Center
3ETH Zurich
4NVIDIA
+
+Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask3d-for-3d-semantic-instance-segmentation/3d-instance-segmentation-on-scannetv2)](https://paperswithcode.com/sota/3d-instance-segmentation-on-scannetv2?p=mask3d-for-3d-semantic-instance-segmentation)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask3d-for-3d-semantic-instance-segmentation/3d-instance-segmentation-on-scannet200)](https://paperswithcode.com/sota/3d-instance-segmentation-on-scannet200?p=mask3d-for-3d-semantic-instance-segmentation)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask3d-for-3d-semantic-instance-segmentation/3d-instance-segmentation-on-s3dis)](https://paperswithcode.com/sota/3d-instance-segmentation-on-s3dis?p=mask3d-for-3d-semantic-instance-segmentation)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mask3d-for-3d-semantic-instance-segmentation/3d-instance-segmentation-on-stpls3d)](https://paperswithcode.com/sota/3d-instance-segmentation-on-stpls3d?p=mask3d-for-3d-semantic-instance-segmentation)
+
+
+
+
+
+![teaser](./docs/teaser.jpg)
+
+
+
+
+[[Project Webpage](https://jonasschult.github.io/Mask3D/)]
+[[Paper](https://arxiv.org/abs/2210.03105)]
+[[Demo](https://francisengelmann.github.io/mask3d/)]
+
+
+## News
+
+* **17. January 2023**: Mask3D is accepted at ICRA 2023. :fire:
+* **14. October 2022**: STPLS3D support added.
+* **10. October 2022**: Mask3D ranks 2nd on the [STPLS3D Challenge](https://codalab.lisn.upsaclay.fr/competitions/4646#results) hosted by the [Urban3D Workshop](https://urban3dchallenge.github.io/) at ECCV 2022.
+* **6. October 2022**: [Mask3D preprint](https://arxiv.org/abs/2210.03105) released on arXiv.
+* **25. September 2022**: Code released.
+
+## Code structure
+We adapt the codebase of [Mix3D](https://github.com/kumuji/mix3d) which provides a highly modularized framework for 3D Semantic Segmentation based on the MinkowskiEngine.
+
+```
+├── mix3d
+│ ├── main_instance_segmentation.py <- the main file
+│ ├── conf <- hydra configuration files
+│ ├── datasets
+│ │ ├── preprocessing <- folder with preprocessing scripts
+│ │ ├── semseg.py <- indoor dataset
+│ │ └── utils.py
+│ ├── models <- Mask3D modules
+│ ├── trainer
+│ │ ├── __init__.py
+│ │ └── trainer.py <- train loop
+│ └── utils
+├── data
+│ ├── processed <- folder for preprocessed datasets
+│ └── raw <- folder for raw datasets
+├── scripts <- train scripts
+├── docs
+├── README.md
+└── saved <- folder that stores models and logs
+```
+
+### Dependencies :memo:
+The main dependencies of the project are the following:
+```yaml
+python: 3.10.9
+cuda: 11.3
+```
+You can set up a conda environment as follows
+```
+# Some users experienced issues on Ubuntu with an AMD CPU
+# Install libopenblas-dev (issue #115, thanks WindWing)
+# sudo apt-get install libopenblas-dev
+
+export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"
+
+conda env create -f environment.yml
+
+conda activate mask3d_cuda113
+
+pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
+pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
+pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
+
+mkdir third_party
+cd third_party
+
+git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
+cd MinkowskiEngine
+git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
+python setup.py install --force_cuda --blas=openblas
+
+cd ..
+git clone https://github.com/ScanNet/ScanNet.git
+cd ScanNet/Segmentator
+git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
+make
+
+cd ../../pointnet2
+python setup.py install
+
+cd ../../
+pip3 install pytorch-lightning==1.7.2
+```
+
+### Data preprocessing :hammer:
+After installing the dependencies, we preprocess the datasets.
+
+#### ScanNet / ScanNet200
+First, we apply Felzenswalb and Huttenlocher's Graph Based Image Segmentation algorithm to the test scenes using the default parameters.
+Please refer to the [original repository](https://github.com/ScanNet/ScanNet/tree/master/Segmentator) for details.
+Put the resulting segmentations in `./data/raw/scannet_test_segments`.
+```
+python -m datasets.preprocessing.scannet_preprocessing preprocess \
+--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
+--save_dir="data/processed/scannet" \
+--git_repo="PATH_TO_SCANNET_GIT_REPO" \
+--scannet200=false/true
+```
+
+#### S3DIS
+The S3DIS dataset contains some smalls bugs which we initially fixed manually. We will soon release a preprocessing script which directly preprocesses the original dataset. For the time being, please follow the instructions [here](https://github.com/JonasSchult/Mask3D/issues/8#issuecomment-1279535948) to fix the dataset manually. Afterwards, call the preprocessing script as follows:
+
+```
+python -m datasets.preprocessing.s3dis_preprocessing preprocess \
+--data_dir="PATH_TO_Stanford3dDataset_v1.2" \
+--save_dir="data/processed/s3dis"
+```
+
+#### STPLS3D
+```
+python -m datasets.preprocessing.stpls3d_preprocessing preprocess \
+--data_dir="PATH_TO_STPLS3D" \
+--save_dir="data/processed/stpls3d"
+```
+
+### Training and testing :train2:
+Train Mask3D on the ScanNet dataset:
+```bash
+python main_instance_segmentation.py
+```
+Please refer to the [config scripts](https://github.com/JonasSchult/Mask3D/tree/main/scripts) (for example [here](https://github.com/JonasSchult/Mask3D/blob/main/scripts/scannet/scannet_val.sh#L15)) for detailed instructions how to reproduce our results.
+In the simplest case the inference command looks as follows:
+```bash
+python main_instance_segmentation.py \
+general.checkpoint='PATH_TO_CHECKPOINT.ckpt' \
+general.train_mode=false
+```
+
+## Trained checkpoints :floppy_disk:
+We provide detailed scores and network configurations with trained checkpoints.
+
+### [S3DIS](http://buildingparser.stanford.edu/dataset.html) (pretrained on ScanNet train+val)
+Following PointGroup, HAIS and SoftGroup, we finetune a model pretrained on ScanNet ([config](./scripts/scannet/scannet_pretrain_for_s3dis.sh) and [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/scannet_pretrained.ckpt)).
+| Dataset | AP | AP_50 | AP_25 | Config | Checkpoint :floppy_disk: | Scores :chart_with_upwards_trend: | Visualizations :telescope:
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+| Area 1 | 69.3 | 81.9 | 87.7 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area1_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area1_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_1/)
+| Area 2 | 44.0 | 59.5 | 66.5 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area2_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area2_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_2/)
+| Area 3 | 73.4 | 83.2 | 88.2 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area3_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area3_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_3/)
+| Area 4 | 58.0 | 69.5 | 74.9 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area4_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area4_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_4/)
+| Area 5 | 57.8 | 71.9 | 77.2 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area5_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area5_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_5/)
+| Area 6 | 68.4 | 79.9 | 85.2 | [config](scripts/s3dis/s3dis_pretrained.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/scannet_pretrained/area6_scannet_pretrained.ckpt) | [scores](./docs/detailed_scores/s3dis/scannet_pretrained/s3dis_area6_scannet_pretrained.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/scannet_pretrained/area_6/)
+
+### [S3DIS](http://buildingparser.stanford.edu/dataset.html) (from scratch)
+
+| Dataset | AP | AP_50 | AP_25 | Config | Checkpoint :floppy_disk: | Scores :chart_with_upwards_trend: | Visualizations :telescope:
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+| Area 1 | 74.1 | 85.1 | 89.6 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area1_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area1_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_1/)
+| Area 2 | 44.9 | 57.1 | 67.9 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area2_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area2_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_2/)
+| Area 3 | 74.4 | 84.4 | 88.1 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area3_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area3_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_3/)
+| Area 4 | 63.8 | 74.7 | 81.1 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area4_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area4_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_4/)
+| Area 5 | 56.6 | 68.4 | 75.2 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area5_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area5_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_5/)
+| Area 6 | 73.3 | 83.4 | 87.8 | [config](scripts/s3dis/s3dis_from_scratch.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/s3dis/from_scratch/area6_from_scratch.ckpt) | [scores](./docs/detailed_scores/s3dis/from_scratch/s3dis_area6_from_scratch.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/s3dis/from_scratch/area_6/)
+
+### [ScanNet v2](https://kaldir.vc.in.tum.de/scannet_benchmark/semantic_instance_3d?metric=ap)
+
+| Dataset | AP | AP_50 | AP_25 | Config | Checkpoint :floppy_disk: | Scores :chart_with_upwards_trend: | Visualizations :telescope:
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+| ScanNet val | 55.2 | 73.7 | 83.5 | [config](scripts/scannet/scannet_val.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_val.ckpt) | [scores](./docs/detailed_scores/scannet_val.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/scannet/val/)
+| ScanNet test | 56.6 | 78.0 | 87.0 | [config](scripts/scannet/scannet_benchmark.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_benchmark.ckpt) | [scores](http://kaldir.vc.in.tum.de/scannet_benchmark/result_details?id=1081) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/scannet/test/)
+
+### [ScanNet 200](https://kaldir.vc.in.tum.de/scannet_benchmark/scannet200_semantic_instance_3d)
+
+| Dataset | AP | AP_50 | AP_25 | Config | Checkpoint :floppy_disk: | Scores :chart_with_upwards_trend: | Visualizations :telescope:
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+| ScanNet200 val | 27.4 | 37.0 | 42.3 | [config](scripts/scannet200/scannet200_val.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet200/scannet200_val.ckpt) | [scores](./docs/detailed_scores/scannet200_val.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/scannet200/val/)
+| ScanNet200 test | 27.8 | 38.8 | 44.5 | [config](scripts/scannet200/scannet200_benchmark.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet200/scannet200_benchmark.ckpt) | [scores](https://kaldir.vc.in.tum.de/scannet_benchmark/result_details?id=1242) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/scannet200/test/)
+
+### [STPLS3D](https://www.stpls3d.com/)
+
+| Dataset | AP | AP_50 | AP_25 | Config | Checkpoint :floppy_disk: | Scores :chart_with_upwards_trend: | Visualizations :telescope:
+|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
+| STPLS3D val | 57.3 | 74.3 | 81.6 | [config](scripts/stpls3d/stpls3d_val.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/stpls3d/stpls3d_val.ckpt) | [scores](./docs/detailed_scores/stpls3d.txt) | [visualizations](https://omnomnom.vision.rwth-aachen.de/data/mask3d/visualizations/stpls3d/)
+| STPLS3D test | 63.4 | 79.2 | 85.6 | [config](scripts/stpls3d/stpls3d_benchmark.sh) | [checkpoint](https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/stpls3d/stpls3d_benchmark.zip) | [scores](https://codalab.lisn.upsaclay.fr/competitions/4646#results) | visualizations
+
+## BibTeX :pray:
+```
+@article{Schult23ICRA,
+ title = {{Mask3D: Mask Transformer for 3D Semantic Instance Segmentation}},
+ author = {Schult, Jonas and Engelmann, Francis and Hermans, Alexander and Litany, Or and Tang, Siyu and Leibe, Bastian},
+ booktitle = {{International Conference on Robotics and Automation (ICRA)}},
+ year = {2023}
+}
+```
diff --git a/models/Mask3D/__init__.py b/models/Mask3D/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/__init__.py b/models/Mask3D/build/lib/mask3d/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..0b01a17620598f366cfa55c36a48609b1f0075f6
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/__init__.py
@@ -0,0 +1,216 @@
+import hydra
+import torch
+
+from mask3d.models.mask3d import Mask3D
+from mask3d.utils.utils import (
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+
+class InstanceSegmentation(torch.nn.Module):
+ def __init__(self, cfg):
+ super().__init__()
+ self.model = hydra.utils.instantiate(cfg.model)
+
+
+ def forward(self, x, raw_coordinates=None, point2segment=None):
+ return self.model(x, raw_coordinates=raw_coordinates, point2segment=point2segment)
+
+
+from omegaconf import OmegaConf, DictConfig
+import hydra
+from hydra.core.global_hydra import GlobalHydra
+from hydra.experimental import initialize, compose
+
+# imports for input loading
+import albumentations as A
+import MinkowskiEngine as ME
+import numpy as np
+import open3d as o3d
+
+# imports for output
+from mask3d.datasets.scannet200.scannet200_constants import (VALID_CLASS_IDS_20, VALID_CLASS_IDS_200, SCANNET_COLOR_MAP_20, SCANNET_COLOR_MAP_200)
+
+def get_model(checkpoint_path=None, dataset_name = "scannet200"):
+
+
+ # Initialize the directory with config files
+ with initialize(config_path="conf"):
+ # Compose a configuration
+ cfg = compose(config_name="config_base_instance_segmentation.yaml")
+
+ cfg.general.checkpoint = checkpoint_path
+
+ # would be nicd to avoid this hardcoding below
+ # dataset_name = checkpoint_path.split('/')[-1].split('_')[0]
+ if dataset_name == 'scannet200':
+ cfg.general.num_targets = 201
+ cfg.general.train_mode = False
+ cfg.general.eval_on_segments = True
+ cfg.general.topk_per_image = 300
+ cfg.general.use_dbscan = True
+ cfg.general.dbscan_eps = 0.95
+ cfg.general.export_threshold = 0.001
+
+ # # data
+ cfg.data.num_labels = 200
+ cfg.data.test_mode = "validation"
+
+ # # model
+ cfg.model.num_queries = 150
+
+ if dataset_name == 'scannet':
+ cfg.general.num_targets = 19
+ cfg.general.train_mode = False
+ cfg.general.eval_on_segments = True
+ cfg.general.topk_per_image = 300
+ cfg.general.use_dbscan = True
+ cfg.general.dbscan_eps = 0.95
+ cfg.general.export_threshold = 0.001
+
+ # # data
+ cfg.data.num_labels = 20
+ cfg.data.test_mode = "test"
+
+ # # model
+ cfg.model.num_queries = 150
+
+ #TODO: this has to be fixed and discussed with Jonas
+ # cfg.model.scene_min = -3.
+ # cfg.model.scene_max = 3.
+
+ # # Initialize the Hydra context
+ # hydra.core.global_hydra.GlobalHydra.instance().clear()
+ # hydra.initialize(config_path="conf")
+
+ # Load the configuration
+ # cfg = hydra.compose(config_name="config_base_instance_segmentation.yaml")
+ model = InstanceSegmentation(cfg)
+
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ return model
+
+
+def load_mesh(pcl_file):
+
+ # load point cloud
+ input_mesh_path = pcl_file
+ mesh = o3d.io.read_triangle_mesh(input_mesh_path)
+ return mesh
+
+def prepare_data(mesh, device):
+
+ # normalization for point cloud features
+ color_mean = (0.47793125906962, 0.4303257521323044, 0.3749598901421883)
+ color_std = (0.2834475483823543, 0.27566157565723015, 0.27018971370874995)
+ normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+
+ points = np.asarray(mesh.vertices)
+ colors = np.asarray(mesh.vertex_colors)
+ colors = colors * 255.
+
+ pseudo_image = colors.astype(np.uint8)[np.newaxis, :, :]
+ colors = np.squeeze(normalize_color(image=pseudo_image)["image"])
+
+ coords = np.floor(points / 0.02)
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(
+ coordinates=coords,
+ features=colors,
+ return_index=True,
+ return_inverse=True,
+ )
+
+ sample_coordinates = coords[unique_map]
+ coordinates = [torch.from_numpy(sample_coordinates).int()]
+ sample_features = colors[unique_map]
+ features = [torch.from_numpy(sample_features).float()]
+
+ coordinates, _ = ME.utils.sparse_collate(coords=coordinates, feats=features)
+ features = torch.cat(features, dim=0)
+ data = ME.SparseTensor(
+ coordinates=coordinates,
+ features=features,
+ device=device,
+ )
+
+
+ return data, points, colors, features, unique_map, inverse_map
+
+
+def map_output_to_pointcloud(mesh,
+ outputs,
+ inverse_map):
+
+ # parse predictions
+ logits = outputs["pred_logits"]
+ masks = outputs["pred_masks"]
+
+ # reformat predictions
+ logits = logits[0]
+ masks = masks[0]
+
+ labels = []
+ confidences = []
+ masks_binary = []
+
+ for i in range(len(logits)):
+ p_labels = torch.softmax(logits[i], dim=-1)
+ p_masks = torch.sigmoid(masks[:, i])
+ l = torch.argmax(p_labels, dim=-1)
+ c_label = torch.max(p_labels)
+ m = p_masks > 0.5
+ c_m = p_masks[m].sum() / (m.sum() + 1e-8)
+ c = c_label * c_m
+ labels.append(l.item())
+ confidences.append(c.item())
+ masks_binary.append(m[inverse_map]) # mapping the mask back to the original point cloud
+ return (torch.stack(masks_binary), torch.tensor(confidences))
+
+def save_colorized_mesh(mesh, labels_mapped, output_file, colormap='scannet'):
+
+ # colorize mesh
+ colors = np.zeros((len(mesh.vertices), 3))
+ for li in np.unique(labels_mapped):
+ if colormap == 'scannet':
+ raise ValueError('Not implemented yet')
+ elif colormap == 'scannet200':
+ v_li = VALID_CLASS_IDS_200[int(li)]
+ colors[(labels_mapped == li)[:, 0], :] = SCANNET_COLOR_MAP_200[v_li]
+ else:
+ raise ValueError('Unknown colormap - not supported')
+
+ colors = colors / 255.
+ mesh.vertex_colors = o3d.utility.Vector3dVector(colors)
+ o3d.io.write_triangle_mesh(output_file, mesh)
+
+if __name__ == '__main__':
+
+ model = get_model('checkpoints/scannet200/scannet200_benchmark.ckpt')
+ model.eval()
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ model.to(device)
+
+ # load input data
+ pointcloud_file = 'data/pcl.ply'
+ mesh = load_mesh(pointcloud_file)
+
+ # prepare data
+ data, points, colors, features, unique_map, inverse_map = prepare_data(mesh, device)
+
+ # run model
+ with torch.no_grad():
+ outputs = model(data, raw_coordinates=features)
+
+ # map output to point cloud
+ labels = map_output_to_pointcloud(mesh, outputs, inverse_map)
+
+ # save colorized mesh
+ save_colorized_mesh(mesh, labels, 'data/pcl_labelled.ply', colormap='scannet200')
+
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/benchmark/__init__.py b/models/Mask3D/build/lib/mask3d/benchmark/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/benchmark/evaluate_semantic_instance.py b/models/Mask3D/build/lib/mask3d/benchmark/evaluate_semantic_instance.py
new file mode 100644
index 0000000000000000000000000000000000000000..242cb87a09b5c69a0d967217a2cd97706197a63d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/benchmark/evaluate_semantic_instance.py
@@ -0,0 +1,1141 @@
+# Evaluates semantic instance task
+# Adapted from the CityScapes evaluation: https://github.com/mcordts/cityscapesScripts/tree/master/cityscapesscripts/evaluation
+# Input:
+# - path to .txt prediction files
+# - path to .txt ground truth files
+# - output file to write results to
+# Each .txt prediction file look like:
+# [(pred0) rel. path to pred. mask over verts as .txt] [(pred0) label id] [(pred0) confidence]
+# [(pred1) rel. path to pred. mask over verts as .txt] [(pred1) label id] [(pred1) confidence]
+# [(pred2) rel. path to pred. mask over verts as .txt] [(pred2) label id] [(pred2) confidence]
+# ...
+#
+# NOTE: The prediction files must live in the root of the given prediction path.
+# Predicted mask .txt files must live in a subfolder.
+# Additionally, filenames must not contain spaces.
+# The relative paths to predicted masks must contain one integer per line,
+# where each line corresponds to vertices in the *_vh_clean_2.ply (in that order).
+# Non-zero integers indicate part of the predicted instance.
+# The label ids specify the class of the corresponding mask.
+# Confidence is a float confidence score of the mask.
+#
+# Note that only the valid classes are used for evaluation,
+# i.e., any ground truth label not in the valid label set
+# is ignored in the evaluation.
+#
+# example usage: evaluate_semantic_instance.py --scan_path [path to scan data] --output_file [output file]
+
+# python imports
+import math
+import os, sys, argparse
+import inspect
+from copy import deepcopy
+from uuid import uuid4
+
+import torch
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+
+from scipy import stats
+
+# currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
+# parentdir = os.path.dirname(currentdir)
+# sys.path.insert(0,parentdir)
+import benchmark.util as util
+import benchmark.util_3d as util_3d
+
+# parser = argparse.ArgumentParser()
+# parser.add_argument('--gt_path', default='', help='path to directory of gt .txt files')
+# parser.add_argument('--output_file', default='', help='output file [default: ./semantic_instance_evaluation.txt]')
+# opt = parser.parse_args()
+
+# if opt.output_file == '':
+# opt.output_file = os.path.join(os.getcwd(), 'semantic_instance_evaluation.txt')
+
+
+# ---------- Label info ---------- #
+CLASS_LABELS = [
+ "cabinet",
+ "bed",
+ "chair",
+ "sofa",
+ "table",
+ "door",
+ "window",
+ "bookshelf",
+ "picture",
+ "counter",
+ "desk",
+ "curtain",
+ "refrigerator",
+ "shower curtain",
+ "toilet",
+ "sink",
+ "bathtub",
+ "otherfurniture",
+]
+VALID_CLASS_IDS = np.array(
+ [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
+)
+ID_TO_LABEL = {}
+LABEL_TO_ID = {}
+for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+# ---------- Evaluation params ---------- #
+# overlaps for evaluation
+opt = {}
+opt["overlaps"] = np.append(np.arange(0.5, 0.95, 0.05), 0.25)
+# minimum region size for evaluation [verts]
+opt["min_region_sizes"] = np.array([100]) # 100 for s3dis, scannet
+# distance thresholds [m]
+opt["distance_threshes"] = np.array([float("inf")])
+# distance confidences
+opt["distance_confs"] = np.array([-float("inf")])
+
+
+def evaluate_matches(matches):
+ overlaps = opt["overlaps"]
+ min_region_sizes = [opt["min_region_sizes"][0]]
+ dist_threshes = [opt["distance_threshes"][0]]
+ dist_confs = [opt["distance_confs"][0]]
+
+ # results: class x overlap
+ ap = np.zeros(
+ (len(dist_threshes), len(CLASS_LABELS), len(overlaps)), float
+ )
+ for di, (min_region_size, distance_thresh, distance_conf) in enumerate(
+ zip(min_region_sizes, dist_threshes, dist_confs)
+ ):
+ for oi, overlap_th in enumerate(overlaps):
+ pred_visited = {}
+ for m in matches:
+ for p in matches[m]["pred"]:
+ for label_name in CLASS_LABELS:
+ for p in matches[m]["pred"][label_name]:
+ if "uuid" in p:
+ pred_visited[p["uuid"]] = False
+ for li, label_name in enumerate(CLASS_LABELS):
+ y_true = np.empty(0)
+ y_score = np.empty(0)
+ hard_false_negatives = 0
+ has_gt = False
+ has_pred = False
+ for m in matches:
+ pred_instances = matches[m]["pred"][label_name]
+ gt_instances = matches[m]["gt"][label_name]
+ # filter groups in ground truth
+ gt_instances = [
+ gt
+ for gt in gt_instances
+ if gt["instance_id"] >= 1000
+ and gt["vert_count"] >= min_region_size
+ and gt["med_dist"] <= distance_thresh
+ and gt["dist_conf"] >= distance_conf
+ ]
+ if gt_instances:
+ has_gt = True
+ if pred_instances:
+ has_pred = True
+
+ cur_true = np.ones(len(gt_instances))
+ cur_score = np.ones(len(gt_instances)) * (-float("inf"))
+ cur_match = np.zeros(len(gt_instances), dtype=bool)
+ # collect matches
+ for (gti, gt) in enumerate(gt_instances):
+ found_match = False
+ num_pred = len(gt["matched_pred"])
+ for pred in gt["matched_pred"]:
+ # greedy assignments
+ if pred_visited[pred["uuid"]]:
+ continue
+ overlap = float(pred["intersection"]) / (
+ gt["vert_count"]
+ + pred["vert_count"]
+ - pred["intersection"]
+ )
+ if overlap > overlap_th:
+ confidence = pred["confidence"]
+ # if already have a prediction for this gt,
+ # the prediction with the lower score is automatically a false positive
+ if cur_match[gti]:
+ max_score = max(cur_score[gti], confidence)
+ min_score = min(cur_score[gti], confidence)
+ cur_score[gti] = max_score
+ # append false positive
+ cur_true = np.append(cur_true, 0)
+ cur_score = np.append(cur_score, min_score)
+ cur_match = np.append(cur_match, True)
+ # otherwise set score
+ else:
+ found_match = True
+ cur_match[gti] = True
+ cur_score[gti] = confidence
+ pred_visited[pred["uuid"]] = True
+ if not found_match:
+ hard_false_negatives += 1
+ # remove non-matched ground truth instances
+ cur_true = cur_true[cur_match == True]
+ cur_score = cur_score[cur_match == True]
+
+ # collect non-matched predictions as false positive
+ for pred in pred_instances:
+ found_gt = False
+ for gt in pred["matched_gt"]:
+ overlap = float(gt["intersection"]) / (
+ gt["vert_count"]
+ + pred["vert_count"]
+ - gt["intersection"]
+ )
+ if overlap > overlap_th:
+ found_gt = True
+ break
+ if not found_gt:
+ num_ignore = pred["void_intersection"]
+ for gt in pred["matched_gt"]:
+ # group?
+ if gt["instance_id"] < 1000:
+ num_ignore += gt["intersection"]
+ # small ground truth instances
+ if (
+ gt["vert_count"] < min_region_size
+ or gt["med_dist"] > distance_thresh
+ or gt["dist_conf"] < distance_conf
+ ):
+ num_ignore += gt["intersection"]
+ proportion_ignore = (
+ float(num_ignore) / pred["vert_count"]
+ )
+ # if not ignored append false positive
+ if proportion_ignore <= overlap_th:
+ cur_true = np.append(cur_true, 0)
+ confidence = pred["confidence"]
+ cur_score = np.append(cur_score, confidence)
+
+ # append to overall results
+ y_true = np.append(y_true, cur_true)
+ y_score = np.append(y_score, cur_score)
+
+ # compute average precision
+ if has_gt and has_pred:
+ # compute precision recall curve first
+
+ # sorting and cumsum
+ score_arg_sort = np.argsort(y_score)
+ y_score_sorted = y_score[score_arg_sort]
+ y_true_sorted = y_true[score_arg_sort]
+ y_true_sorted_cumsum = np.cumsum(y_true_sorted)
+
+ # unique thresholds
+ (thresholds, unique_indices) = np.unique(
+ y_score_sorted, return_index=True
+ )
+ num_prec_recall = len(unique_indices) + 1
+
+ # prepare precision recall
+ num_examples = len(y_score_sorted)
+ # https://github.com/ScanNet/ScanNet/pull/26
+ # all predictions are non-matched but also all of them are ignored and not counted as FP
+ # y_true_sorted_cumsum is empty
+ # num_true_examples = y_true_sorted_cumsum[-1]
+ num_true_examples = (
+ y_true_sorted_cumsum[-1]
+ if len(y_true_sorted_cumsum) > 0
+ else 0
+ )
+ precision = np.zeros(num_prec_recall)
+ recall = np.zeros(num_prec_recall)
+
+ # deal with the first point
+ y_true_sorted_cumsum = np.append(y_true_sorted_cumsum, 0)
+ # deal with remaining
+ for idx_res, idx_scores in enumerate(unique_indices):
+ cumsum = y_true_sorted_cumsum[idx_scores - 1]
+ tp = num_true_examples - cumsum
+ fp = num_examples - idx_scores - tp
+ fn = cumsum + hard_false_negatives
+ p = float(tp) / (tp + fp)
+ r = float(tp) / (tp + fn)
+ precision[idx_res] = p
+ recall[idx_res] = r
+
+ # first point in curve is artificial
+ precision[-1] = 1.0
+ recall[-1] = 0.0
+
+ # compute average of precision-recall curve
+ recall_for_conv = np.copy(recall)
+ recall_for_conv = np.append(
+ recall_for_conv[0], recall_for_conv
+ )
+ recall_for_conv = np.append(recall_for_conv, 0.0)
+
+ stepWidths = np.convolve(
+ recall_for_conv, [-0.5, 0, 0.5], "valid"
+ )
+ # integrate is now simply a dot product
+ ap_current = np.dot(precision, stepWidths)
+
+ elif has_gt:
+ ap_current = 0.0
+ else:
+ ap_current = float("nan")
+ ap[di, li, oi] = ap_current
+ return ap
+
+
+def compute_averages(aps):
+ d_inf = 0
+ o50 = np.where(np.isclose(opt["overlaps"], 0.5))
+ o25 = np.where(np.isclose(opt["overlaps"], 0.25))
+ oAllBut25 = np.where(np.logical_not(np.isclose(opt["overlaps"], 0.25)))
+ avg_dict = {}
+ # avg_dict['all_ap'] = np.nanmean(aps[ d_inf,:,: ])
+ avg_dict["all_ap"] = np.nanmean(aps[d_inf, :, oAllBut25])
+ avg_dict["all_ap_50%"] = np.nanmean(aps[d_inf, :, o50])
+ avg_dict["all_ap_25%"] = np.nanmean(aps[d_inf, :, o25])
+ avg_dict["classes"] = {}
+ for (li, label_name) in enumerate(CLASS_LABELS):
+ avg_dict["classes"][label_name] = {}
+ # avg_dict["classes"][label_name]["ap"] = np.average(aps[ d_inf,li, :])
+ avg_dict["classes"][label_name]["ap"] = np.average(
+ aps[d_inf, li, oAllBut25]
+ )
+ avg_dict["classes"][label_name]["ap50%"] = np.average(
+ aps[d_inf, li, o50]
+ )
+ avg_dict["classes"][label_name]["ap25%"] = np.average(
+ aps[d_inf, li, o25]
+ )
+ return avg_dict
+
+
+def make_pred_info(pred: dict):
+ # pred = {'pred_scores' = 100, 'pred_classes' = 100 'pred_masks' = Nx100}
+ pred_info = {}
+ assert (
+ pred["pred_classes"].shape[0]
+ == pred["pred_scores"].shape[0]
+ == pred["pred_masks"].shape[1]
+ )
+ for i in range(len(pred["pred_classes"])):
+ info = {}
+ info["label_id"] = pred["pred_classes"][i]
+ info["conf"] = pred["pred_scores"][i]
+ info["mask"] = pred["pred_masks"][:, i]
+ pred_info[uuid4()] = info # we later need to identify these objects
+ return pred_info
+
+
+def assign_instances_for_scan(pred: dict, gt_file: str):
+ pred_info = make_pred_info(pred)
+ try:
+ gt_ids = util_3d.load_ids(gt_file)
+ except Exception as e:
+ util.print_error("unable to load " + gt_file + ": " + str(e))
+
+ # get gt instances
+ gt_instances = util_3d.get_instances(
+ gt_ids, VALID_CLASS_IDS, CLASS_LABELS, ID_TO_LABEL
+ )
+ # associate
+ gt2pred = deepcopy(gt_instances)
+ for label in gt2pred:
+ for gt in gt2pred[label]:
+ gt["matched_pred"] = []
+ pred2gt = {}
+ for label in CLASS_LABELS:
+ pred2gt[label] = []
+ num_pred_instances = 0
+ # mask of void labels in the groundtruth
+ bool_void = np.logical_not(np.in1d(gt_ids // 1000, VALID_CLASS_IDS))
+ # go thru all prediction masks
+ for uuid in pred_info:
+ label_id = int(pred_info[uuid]["label_id"])
+ conf = pred_info[uuid]["conf"]
+ if not label_id in ID_TO_LABEL:
+ continue
+ label_name = ID_TO_LABEL[label_id]
+ # read the mask
+ pred_mask = pred_info[uuid]["mask"]
+ assert len(pred_mask) == len(gt_ids)
+ # convert to binary
+ pred_mask = np.not_equal(pred_mask, 0)
+ num = np.count_nonzero(pred_mask)
+ if num < opt["min_region_sizes"][0]:
+ continue # skip if empty
+
+ pred_instance = {}
+ pred_instance["uuid"] = uuid
+ pred_instance["pred_id"] = num_pred_instances
+ pred_instance["label_id"] = label_id
+ pred_instance["vert_count"] = num
+ pred_instance["confidence"] = conf
+ pred_instance["void_intersection"] = np.count_nonzero(
+ np.logical_and(bool_void, pred_mask)
+ )
+
+ # matched gt instances
+ matched_gt = []
+ # go thru all gt instances with matching label
+ for (gt_num, gt_inst) in enumerate(gt2pred[label_name]):
+ intersection = np.count_nonzero(
+ np.logical_and(gt_ids == gt_inst["instance_id"], pred_mask)
+ )
+ if intersection > 0:
+ gt_copy = gt_inst.copy()
+ pred_copy = pred_instance.copy()
+ gt_copy["intersection"] = intersection
+ pred_copy["intersection"] = intersection
+ matched_gt.append(gt_copy)
+ gt2pred[label_name][gt_num]["matched_pred"].append(pred_copy)
+ pred_instance["matched_gt"] = matched_gt
+ num_pred_instances += 1
+ pred2gt[label_name].append(pred_instance)
+
+ return gt2pred, pred2gt
+
+
+def print_results(avgs):
+ sep = ""
+ col1 = ":"
+ lineLen = 64
+
+ print("")
+ print("#" * lineLen)
+ line = ""
+ line += "{:<15}".format("what") + sep + col1
+ line += "{:>15}".format("AP") + sep
+ line += "{:>15}".format("AP_50%") + sep
+ line += "{:>15}".format("AP_25%") + sep
+ print(line)
+ print("#" * lineLen)
+
+ for (li, label_name) in enumerate(CLASS_LABELS):
+ ap_avg = avgs["classes"][label_name]["ap"]
+ ap_50o = avgs["classes"][label_name]["ap50%"]
+ ap_25o = avgs["classes"][label_name]["ap25%"]
+ line = "{:<15}".format(label_name) + sep + col1
+ line += sep + "{:>15.3f}".format(ap_avg) + sep
+ line += sep + "{:>15.3f}".format(ap_50o) + sep
+ line += sep + "{:>15.3f}".format(ap_25o) + sep
+ print(line)
+
+ all_ap_avg = avgs["all_ap"]
+ all_ap_50o = avgs["all_ap_50%"]
+ all_ap_25o = avgs["all_ap_25%"]
+
+ print("-" * lineLen)
+ line = "{:<15}".format("average") + sep + col1
+ line += "{:>15.3f}".format(all_ap_avg) + sep
+ line += "{:>15.3f}".format(all_ap_50o) + sep
+ line += "{:>15.3f}".format(all_ap_25o) + sep
+ print(line)
+ print("")
+
+
+def write_result_file(avgs, filename):
+ _SPLITTER = ","
+ with open(filename, "w") as f:
+ f.write(
+ _SPLITTER.join(["class", "class id", "ap", "ap50", "ap25"]) + "\n"
+ )
+ for i in range(len(VALID_CLASS_IDS)):
+ class_name = CLASS_LABELS[i]
+ class_id = VALID_CLASS_IDS[i]
+ ap = avgs["classes"][class_name]["ap"]
+ ap50 = avgs["classes"][class_name]["ap50%"]
+ ap25 = avgs["classes"][class_name]["ap25%"]
+ f.write(
+ _SPLITTER.join(
+ [str(x) for x in [class_name, class_id, ap, ap50, ap25]]
+ )
+ + "\n"
+ )
+
+
+def evaluate(
+ preds: dict, gt_path: str, output_file: str, dataset: str = "scannet"
+):
+ global CLASS_LABELS
+ global VALID_CLASS_IDS
+ global ID_TO_LABEL
+ global LABEL_TO_ID
+ global opt
+
+ if dataset == "stpls3d":
+ # global CLASS_LABELS
+ # global VALID_CLASS_IDS
+ # global ID_TO_LABEL
+ # global LABEL_TO_ID
+
+ opt["min_region_sizes"] = np.array([10])
+
+ CLASS_LABELS = [
+ "Build",
+ "LowVeg",
+ "MediumVeg",
+ "HighVeg",
+ "Vehicle",
+ "Truck",
+ "Aircraft",
+ "MilitaryVeh",
+ "Bike",
+ "Motorcycle",
+ "LightPole",
+ "StreetSign",
+ "Clutter",
+ "Fence",
+ ]
+ VALID_CLASS_IDS = np.array(
+ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+ )
+
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ if dataset == "s3dis":
+ # global CLASS_LABELS
+ # global VALID_CLASS_IDS
+ # global ID_TO_LABEL
+ # global LABEL_TO_ID
+
+ CLASS_LABELS = [
+ "ceiling",
+ "floor",
+ "wall",
+ "beam",
+ "column",
+ "window",
+ "door",
+ "table",
+ "chair",
+ "sofa",
+ "bookcase",
+ "board",
+ "clutter",
+ ]
+ VALID_CLASS_IDS = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ if dataset == "scannet200":
+ CLASS_LABELS = (
+ "chair",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "bicycle",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "guitar case",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "structure",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "storage organizer",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "luggage",
+ "mattress",
+ )
+
+ VALID_CLASS_IDS = np.array(
+ (
+ 2,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 121,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 221,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 286,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 331,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 399,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 572,
+ 581,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1174,
+ 1175,
+ 1176,
+ 1178,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1183,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1190,
+ 1191,
+ )
+ )
+
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ total_true = 0
+ total_seen = 0
+ NUM_CLASSES = len(VALID_CLASS_IDS)
+
+ true_positive_classes = np.zeros(NUM_CLASSES)
+ positive_classes = np.zeros(NUM_CLASSES)
+ gt_classes = np.zeros(NUM_CLASSES)
+
+ # precision & recall
+ total_gt_ins = np.zeros(NUM_CLASSES)
+ at = 0.5
+ tpsins = [[] for _ in range(NUM_CLASSES)]
+ fpsins = [[] for _ in range(NUM_CLASSES)]
+ # mucov and mwcov
+ all_mean_cov = [[] for _ in range(NUM_CLASSES)]
+ all_mean_weighted_cov = [[] for _ in range(NUM_CLASSES)]
+
+ print("evaluating", len(preds), "scans...")
+ matches = {}
+ for i, (k, v) in enumerate(preds.items()):
+ gt_file = os.path.join(gt_path, k + ".txt")
+ if not os.path.isfile(gt_file):
+ util.print_error(
+ "Scan {} does not match any gt file".format(k), user_fault=True
+ )
+
+ if dataset == "s3dis":
+ gt_ids = util_3d.load_ids(gt_file)
+ gt_sem = (gt_ids // 1000) - 1
+ gt_ins = gt_ids - (gt_ids // 1000) * 1000
+
+ # pred_sem = v['pred_classes'] - 1
+ pred_sem = np.zeros(v["pred_masks"].shape[0], dtype=np.int)
+ # TODO CONTINUE HERE!!!!!!!!!!!!!
+ pred_ins = np.zeros(v["pred_masks"].shape[0], dtype=np.int)
+
+ for inst_id in reversed(range(v["pred_masks"].shape[1])):
+ point_ids = np.argwhere(v["pred_masks"][:, inst_id] == 1.0)[
+ :, 0
+ ]
+ pred_ins[point_ids] = inst_id + 1
+ pred_sem[point_ids] = v["pred_classes"][inst_id] - 1
+
+ # semantic acc
+ total_true += np.sum(pred_sem == gt_sem)
+ total_seen += pred_sem.shape[0]
+
+ # TODO PARALLELIZ THIS!!!!!!!
+ # pn semantic mIoU
+ """
+ for j in range(gt_sem.shape[0]):
+ gt_l = int(gt_sem[j])
+ pred_l = int(pred_sem[j])
+ gt_classes[gt_l] += 1
+ positive_classes[pred_l] += 1
+ true_positive_classes[gt_l] += int(gt_l == pred_l)
+ """
+
+ uniq, counts = np.unique(pred_sem, return_counts=True)
+ positive_classes[uniq] += counts
+
+ uniq, counts = np.unique(gt_sem, return_counts=True)
+ gt_classes[uniq] += counts
+
+ uniq, counts = np.unique(
+ gt_sem[pred_sem == gt_sem], return_counts=True
+ )
+ true_positive_classes[uniq] += counts
+
+ # instance
+ un = np.unique(pred_ins)
+ pts_in_pred = [[] for _ in range(NUM_CLASSES)]
+ for ig, g in enumerate(un): # each object in prediction
+ if g == -1:
+ continue
+ tmp = pred_ins == g
+ sem_seg_i = int(stats.mode(pred_sem[tmp])[0])
+ pts_in_pred[sem_seg_i] += [tmp]
+
+ un = np.unique(gt_ins)
+ pts_in_gt = [[] for _ in range(NUM_CLASSES)]
+ for ig, g in enumerate(un):
+ tmp = gt_ins == g
+ sem_seg_i = int(stats.mode(gt_sem[tmp])[0])
+ pts_in_gt[sem_seg_i] += [tmp]
+
+ # instance mucov & mwcov
+ for i_sem in range(NUM_CLASSES):
+ sum_cov = 0
+ mean_cov = 0
+ mean_weighted_cov = 0
+ num_gt_point = 0
+ for ig, ins_gt in enumerate(pts_in_gt[i_sem]):
+ ovmax = 0.0
+ num_ins_gt_point = np.sum(ins_gt)
+ num_gt_point += num_ins_gt_point
+ for ip, ins_pred in enumerate(pts_in_pred[i_sem]):
+ union = ins_pred | ins_gt
+ intersect = ins_pred & ins_gt
+ iou = float(np.sum(intersect)) / np.sum(union)
+
+ if iou > ovmax:
+ ovmax = iou
+ ipmax = ip
+
+ sum_cov += ovmax
+ mean_weighted_cov += ovmax * num_ins_gt_point
+
+ if len(pts_in_gt[i_sem]) != 0:
+ mean_cov = sum_cov / len(pts_in_gt[i_sem])
+ all_mean_cov[i_sem].append(mean_cov)
+
+ mean_weighted_cov /= num_gt_point
+ all_mean_weighted_cov[i_sem].append(mean_weighted_cov)
+
+ if dataset == "s3dis":
+ # instance precision & recall
+ for i_sem in range(NUM_CLASSES):
+ tp = [0.0] * len(pts_in_pred[i_sem])
+ fp = [0.0] * len(pts_in_pred[i_sem])
+ gtflag = np.zeros(len(pts_in_gt[i_sem]))
+ total_gt_ins[i_sem] += len(pts_in_gt[i_sem])
+
+ for ip, ins_pred in enumerate(pts_in_pred[i_sem]):
+ ovmax = -1.0
+
+ for ig, ins_gt in enumerate(pts_in_gt[i_sem]):
+ union = ins_pred | ins_gt
+ intersect = ins_pred & ins_gt
+ iou = float(np.sum(intersect)) / np.sum(union)
+
+ if iou > ovmax:
+ ovmax = iou
+ igmax = ig
+
+ if ovmax >= at:
+ tp[ip] = 1 # true
+ else:
+ fp[ip] = 1 # false positive
+
+ tpsins[i_sem] += tp
+ fpsins[i_sem] += fp
+
+ matches_key = os.path.abspath(gt_file)
+ # assign gt to predictions
+ gt2pred, pred2gt = assign_instances_for_scan(v, gt_file)
+ matches[matches_key] = {}
+ matches[matches_key]["gt"] = gt2pred
+ matches[matches_key]["pred"] = pred2gt
+ sys.stdout.write("\rscans processed: {}".format(i + 1))
+ sys.stdout.flush()
+ print("")
+ ap_scores = evaluate_matches(matches)
+ avgs = compute_averages(ap_scores)
+
+ # print
+ print_results(avgs)
+ write_result_file(avgs, output_file)
+
+ if dataset == "s3dis":
+ MUCov = np.zeros(NUM_CLASSES)
+ MWCov = np.zeros(NUM_CLASSES)
+ for i_sem in range(NUM_CLASSES):
+ MUCov[i_sem] = np.mean(all_mean_cov[i_sem])
+ MWCov[i_sem] = np.mean(all_mean_weighted_cov[i_sem])
+
+ precision = np.zeros(NUM_CLASSES)
+ recall = np.zeros(NUM_CLASSES)
+ for i_sem in range(NUM_CLASSES):
+ tp = np.asarray(tpsins[i_sem]).astype(np.float)
+ fp = np.asarray(fpsins[i_sem]).astype(np.float)
+ tp = np.sum(tp)
+ fp = np.sum(fp)
+ rec = tp / total_gt_ins[i_sem]
+ prec = tp / (tp + fp)
+
+ precision[i_sem] = prec
+ recall[i_sem] = rec
+
+ """
+ LOG_FOUT = open(os.path.join('results_a5.txt'), 'w')
+
+ def log_string(out_str):
+ LOG_FOUT.write(out_str + '\n')
+ LOG_FOUT.flush()
+ print(out_str)
+ """
+
+ return np.mean(precision), np.mean(recall)
+
+
+# TODO: remove this
+# import pandas as pd
+# def main():
+# print("!!! CLI is only for debugging purposes. use `evaluate()` instead.")
+# evaluate(pd.read_pickle("/globalwork/schult/saved_predictions.pkl"), opt.gt_path, opt.output_file)
+
+# if __name__ == '__main__':
+# main()
diff --git a/models/Mask3D/build/lib/mask3d/benchmark/util.py b/models/Mask3D/build/lib/mask3d/benchmark/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a4224cd4f785c8a5a7cde490cf0f9999e61dbe7
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/benchmark/util.py
@@ -0,0 +1,128 @@
+import os, sys
+import csv
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+try:
+ import imageio
+except:
+ print("Please install the module 'imageio' for image processing, e.g.")
+ print("pip install imageio")
+ sys.exit(-1)
+
+# print an error message and quit
+def print_error(message, user_fault=False):
+ sys.stderr.write("ERROR: " + str(message) + "\n")
+ if user_fault:
+ sys.exit(2)
+ sys.exit(-1)
+
+
+# if string s represents an int
+def represents_int(s):
+ try:
+ int(s)
+ return True
+ except ValueError:
+ return False
+
+
+def read_label_mapping(
+ filename, label_from="raw_category", label_to="nyu40id"
+):
+ assert os.path.isfile(filename)
+ mapping = dict()
+ with open(filename) as csvfile:
+ reader = csv.DictReader(csvfile, delimiter="\t")
+ for row in reader:
+ mapping[row[label_from]] = int(row[label_to])
+ # if ints convert
+ if represents_int(list(mapping.keys())[0]):
+ mapping = {int(k): v for k, v in mapping.items()}
+ return mapping
+
+
+# input: scene_types.txt or scene_types_all.txt
+def read_scene_types_mapping(filename, remove_spaces=True):
+ assert os.path.isfile(filename)
+ mapping = dict()
+ lines = open(filename).read().splitlines()
+ lines = [line.split("\t") for line in lines]
+ if remove_spaces:
+ mapping = {x[1].strip(): int(x[0]) for x in lines}
+ else:
+ mapping = {x[1]: int(x[0]) for x in lines}
+ return mapping
+
+
+# color by label
+def visualize_label_image(filename, image):
+ height = image.shape[0]
+ width = image.shape[1]
+ vis_image = np.zeros([height, width, 3], dtype=np.uint8)
+ color_palette = create_color_palette()
+ for idx, color in enumerate(color_palette):
+ vis_image[image == idx] = color
+ imageio.imwrite(filename, vis_image)
+
+
+# color by different instances (mod length of color palette)
+def visualize_instance_image(filename, image):
+ height = image.shape[0]
+ width = image.shape[1]
+ vis_image = np.zeros([height, width, 3], dtype=np.uint8)
+ color_palette = create_color_palette()
+ instances = np.unique(image)
+ for idx, inst in enumerate(instances):
+ vis_image[image == inst] = color_palette[inst % len(color_palette)]
+ imageio.imwrite(filename, vis_image)
+
+
+# color palette for nyu40 labels
+def create_color_palette():
+ return [
+ (0, 0, 0),
+ (174, 199, 232), # wall
+ (152, 223, 138), # floor
+ (31, 119, 180), # cabinet
+ (255, 187, 120), # bed
+ (188, 189, 34), # chair
+ (140, 86, 75), # sofa
+ (255, 152, 150), # table
+ (214, 39, 40), # door
+ (197, 176, 213), # window
+ (148, 103, 189), # bookshelf
+ (196, 156, 148), # picture
+ (23, 190, 207), # counter
+ (178, 76, 76),
+ (247, 182, 210), # desk
+ (66, 188, 102),
+ (219, 219, 141), # curtain
+ (140, 57, 197),
+ (202, 185, 52),
+ (51, 176, 203),
+ (200, 54, 131),
+ (92, 193, 61),
+ (78, 71, 183),
+ (172, 114, 82),
+ (255, 127, 14), # refrigerator
+ (91, 163, 138),
+ (153, 98, 156),
+ (140, 153, 101),
+ (158, 218, 229), # shower curtain
+ (100, 125, 154),
+ (178, 127, 135),
+ (120, 185, 128),
+ (146, 111, 194),
+ (44, 160, 44), # toilet
+ (112, 128, 144), # sink
+ (96, 207, 209),
+ (227, 119, 194), # bathtub
+ (213, 92, 176),
+ (94, 106, 211),
+ (82, 84, 163), # otherfurn
+ (100, 85, 144),
+ ]
diff --git a/models/Mask3D/build/lib/mask3d/benchmark/util_3d.py b/models/Mask3D/build/lib/mask3d/benchmark/util_3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..572064f3ca251563466ca6bfbe2c70dacdad205f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/benchmark/util_3d.py
@@ -0,0 +1,177 @@
+import os, sys
+import json
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+
+try:
+ from plyfile import PlyData, PlyElement
+except:
+ print("Please install the module 'plyfile' for PLY i/o, e.g.")
+ print("pip install plyfile")
+ sys.exit(-1)
+
+import benchmark.util as util
+
+
+# matrix: 4x4 np array
+# points Nx3 np array
+def transform_points(matrix, points):
+ assert len(points.shape) == 2 and points.shape[1] == 3
+ num_points = points.shape[0]
+ p = np.concatenate([points, np.ones((num_points, 1))], axis=1)
+ p = np.matmul(matrix, np.transpose(p))
+ p = np.transpose(p)
+ p[:, :3] /= p[:, 3, None]
+ return p[:, :3]
+
+
+def export_ids(filename, ids):
+ with open(filename, "w") as f:
+ for id in ids:
+ f.write("%d\n" % id)
+
+
+def load_ids(filename):
+ ids = open(filename).read().splitlines()
+ ids = np.array(ids, dtype=np.int64)
+ return ids
+
+
+def read_mesh_vertices(filename):
+ assert os.path.isfile(filename)
+ with open(filename, "rb") as f:
+ plydata = PlyData.read(f)
+ num_verts = plydata["vertex"].count
+ vertices = np.zeros(shape=[num_verts, 3], dtype=np.float32)
+ vertices[:, 0] = plydata["vertex"].data["x"]
+ vertices[:, 1] = plydata["vertex"].data["y"]
+ vertices[:, 2] = plydata["vertex"].data["z"]
+ return vertices
+
+
+# export 3d instance labels for instance evaluation
+def export_instance_ids_for_eval(filename, label_ids, instance_ids):
+ assert label_ids.shape[0] == instance_ids.shape[0]
+ output_mask_path_relative = "pred_mask"
+ name = os.path.splitext(os.path.basename(filename))[0]
+ output_mask_path = os.path.join(
+ os.path.dirname(filename), output_mask_path_relative
+ )
+ if not os.path.isdir(output_mask_path):
+ os.mkdir(output_mask_path)
+ insts = np.unique(instance_ids)
+ zero_mask = np.zeros(shape=(instance_ids.shape[0]), dtype=np.int32)
+ with open(filename, "w") as f:
+ for idx, inst_id in enumerate(insts):
+ if inst_id == 0: # 0 -> no instance for this vertex
+ continue
+ output_mask_file = os.path.join(
+ output_mask_path_relative, name + "_" + str(idx) + ".txt"
+ )
+ loc = np.where(instance_ids == inst_id)
+ label_id = label_ids[loc[0][0]]
+ f.write("%s %d %f\n" % (output_mask_file, label_id, 1.0))
+ # write mask
+ mask = np.copy(zero_mask)
+ mask[loc[0]] = 1
+ export_ids(output_mask_file, mask)
+
+
+# ------------ Instance Utils ------------ #
+
+
+class Instance(object):
+ instance_id = 0
+ label_id = 0
+ vert_count = 0
+ med_dist = -1
+ dist_conf = 0.0
+
+ def __init__(self, mesh_vert_instances, instance_id):
+ if instance_id == -1:
+ return
+ self.instance_id = int(instance_id)
+ self.label_id = int(self.get_label_id(instance_id))
+ self.vert_count = int(
+ self.get_instance_verts(mesh_vert_instances, instance_id)
+ )
+
+ def get_label_id(self, instance_id):
+ return int(instance_id // 1000)
+
+ def get_instance_verts(self, mesh_vert_instances, instance_id):
+ return (mesh_vert_instances == instance_id).sum()
+
+ def to_json(self):
+ return json.dumps(
+ self, default=lambda o: o.__dict__, sort_keys=True, indent=4
+ )
+
+ def to_dict(self):
+ dict = {}
+ dict["instance_id"] = self.instance_id
+ dict["label_id"] = self.label_id
+ dict["vert_count"] = self.vert_count
+ dict["med_dist"] = self.med_dist
+ dict["dist_conf"] = self.dist_conf
+ return dict
+
+ def from_json(self, data):
+ self.instance_id = int(data["instance_id"])
+ self.label_id = int(data["label_id"])
+ self.vert_count = int(data["vert_count"])
+ if "med_dist" in data:
+ self.med_dist = float(data["med_dist"])
+ self.dist_conf = float(data["dist_conf"])
+
+ def __str__(self):
+ return "(" + str(self.instance_id) + ")"
+
+
+def read_instance_prediction_file(filename, pred_path):
+ lines = open(filename).read().splitlines()
+ instance_info = {}
+ abs_pred_path = os.path.abspath(pred_path)
+ for line in lines:
+ parts = line.split(" ")
+ if len(parts) != 3:
+ util.print_error(
+ "invalid instance prediction file. Expected (per line): [rel path prediction] [label id prediction] [confidence prediction]"
+ )
+ if os.path.isabs(parts[0]):
+ util.print_error(
+ "invalid instance prediction file. First entry in line must be a relative path"
+ )
+ mask_file = os.path.join(os.path.dirname(filename), parts[0])
+ mask_file = os.path.abspath(mask_file)
+ # check that mask_file lives inside prediction path
+ if os.path.commonprefix([mask_file, abs_pred_path]) != abs_pred_path:
+ util.print_error(
+ "predicted mask {} in prediction text file {} points outside of prediction path.".format(
+ mask_file, filename
+ )
+ )
+
+ info = {}
+ info["label_id"] = int(float(parts[1]))
+ info["conf"] = float(parts[2])
+ instance_info[mask_file] = info
+ return instance_info
+
+
+def get_instances(ids, class_ids, class_labels, id2label):
+ instances = {}
+ for label in class_labels:
+ instances[label] = []
+ instance_ids = np.unique(ids)
+ for id in instance_ids:
+ if id == 0:
+ continue
+ inst = Instance(ids, id)
+ if inst.label_id in class_ids:
+ instances[id2label[inst.label_id]].append(inst.to_dict())
+ return instances
diff --git a/models/Mask3D/build/lib/mask3d/conf/__init__.py b/models/Mask3D/build/lib/mask3d/conf/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/conf/augmentation/albumentations_aug.yaml b/models/Mask3D/build/lib/mask3d/conf/augmentation/albumentations_aug.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..006663b4be251bf0f41ac2f66f855ae3d59a2878
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/augmentation/albumentations_aug.yaml
@@ -0,0 +1,30 @@
+__version__: 0.4.5
+transform:
+ __class_fullname__: albumentations.core.composition.Compose
+ additional_targets: {}
+ bbox_params: null
+ keypoint_params: null
+ p: 1.0
+ transforms:
+ - __class_fullname__: albumentations.augmentations.transforms.RandomBrightnessContrast
+ always_apply: true
+ brightness_by_max: true
+ brightness_limit:
+ - -0.2
+ - 0.2
+ contrast_limit:
+ - -0.2
+ - 0.2
+ p: 0.5
+ - __class_fullname__: albumentations.augmentations.transforms.RGBShift
+ always_apply: true
+ b_shift_limit:
+ - -20
+ - 20
+ g_shift_limit:
+ - -20
+ - 20
+ p: 0.5
+ r_shift_limit:
+ - -20
+ - 20
diff --git a/models/Mask3D/build/lib/mask3d/conf/augmentation/volumentations_aug.yaml b/models/Mask3D/build/lib/mask3d/conf/augmentation/volumentations_aug.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3b86407a2e735ad8dbba79f83746ceb79722aedf
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/augmentation/volumentations_aug.yaml
@@ -0,0 +1,53 @@
+# pi = 3.14159265358979
+# pi/2 = 1.57079632679489
+# pi/3 = 1.04719755119659
+# pi/6 = 0.52359877559829
+# pi/12 = 0.26179938779914
+# pi/24 = 0.13089969389957
+#
+__version__: 0.1.6
+transform:
+ __class_fullname__: volumentations.core.composition.Compose
+ additional_targets: {}
+ p: 1.0
+ transforms:
+ - __class_fullname__: volumentations.augmentations.transforms.Scale3d
+ always_apply: true
+ p: 0.5
+ scale_limit:
+ - - -0.1
+ - 0.1
+ - - -0.1
+ - 0.1
+ - - -0.1
+ - 0.1
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 0
+ - 0
+ - 1
+ p: 0.5
+ rotation_limit:
+ - -3.141592653589793
+ - 3.141592653589793
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 0
+ - 1
+ - 0
+ p: 0.5
+ rotation_limit:
+ - -0.13089969389957
+ - 0.13089969389957
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 1
+ - 0
+ - 0
+ p: 0.5
+ rotation_limit:
+ - -0.13089969389957
+ - 0.13089969389957
diff --git a/models/Mask3D/build/lib/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml b/models/Mask3D/build/lib/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7f0958eed35ea4317ddc3f2378dd66336472c0fa
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+- _target_: pytorch_lightning.callbacks.ModelCheckpoint
+ monitor: val_mean_ap_50
+ save_last: true
+ save_top_k: 1
+ mode: max
+ dirpath: ${general.save_dir}
+ filename: "{epoch}-{val_mean_ap_50:.3f}"
+ every_n_epochs: 1
+
+- _target_: pytorch_lightning.callbacks.LearningRateMonitor
diff --git a/models/Mask3D/build/lib/mask3d/conf/config_base_instance_segmentation.yaml b/models/Mask3D/build/lib/mask3d/conf/config_base_instance_segmentation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..61aeae0519bd308a58293d07ee902beb6a64ed5d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/config_base_instance_segmentation.yaml
@@ -0,0 +1,75 @@
+general:
+ train_mode: true
+ task: "instance_segmentation"
+ seed: null
+ checkpoint: null
+ backbone_checkpoint: null
+ freeze_backbone: false # train only last layer
+ linear_probing_backbone: false
+ train_on_segments: false
+ eval_on_segments: false
+ filter_out_instances: false
+ save_visualizations: false
+ visualization_point_size: 20
+ decoder_id: -1
+ export: false
+ use_dbscan: false
+ ignore_class_threshold: 100
+ project_name: scannet
+ workspace: jonasschult
+ experiment_name: DEBUG_ABLATION
+ num_targets: 19
+ add_instance: true
+ dbscan_eps: 0.95
+ dbscan_min_points: 1
+
+
+ export_threshold: 0.0001
+
+ reps_per_epoch: 1
+
+ on_crops: false
+
+ scores_threshold: 0.0
+ iou_threshold: 1.0
+
+ area: 5
+
+ eval_inner_core: -1 # disabled
+
+ topk_per_image: 100
+
+ ignore_mask_idx: []
+
+ max_batch_size: 99999999
+
+ save_dir: saved/${general.experiment_name}
+ # time/commit/md5(config)_uuid
+ # time/experiment_id/version_uuid
+ # experiment_id: 1 # commit[:8], or unique from logger
+ # version: 1 # md5[:8] of config
+
+ gpus: 1
+
+defaults:
+ - data: indoor
+ - data/data_loaders: simple_loader
+ - data/datasets: scannet
+ - data/collation_functions: voxelize_collate
+ - logging: full
+ - model: mask3d
+ - metrics: miou
+ - optimizer: adamw
+ - scheduler: onecyclelr
+ - trainer: trainer600
+ - callbacks: callbacks_instance_segmentation
+ - matcher: hungarian_matcher
+ - loss: set_criterion
+
+hydra:
+ run:
+ dir: saved/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S}
+ sweep:
+ dir: saved/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S}
+ # dir: ${general.save_dir}
+ subdir: ${hydra.job.num}_${hydra.job.id}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate.yaml b/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..026552efb024e4e6fd90bf6bda9df283da2bf4c1
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate.yaml
@@ -0,0 +1,42 @@
+# @package data
+
+train_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.train_mode}
+ small_crops: false
+ very_small_crops: false
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.train_dataset.filter_out_classes}
+ label_offset: ${data.train_dataset.label_offset}
+ num_queries: ${model.num_queries}
+
+validation_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.validation_mode}
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.validation_dataset.filter_out_classes}
+ label_offset: ${data.validation_dataset.label_offset}
+ num_queries: ${model.num_queries}
+
+test_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.test_mode}
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.test_dataset.filter_out_classes}
+ label_offset: ${data.test_dataset.label_offset}
+ num_queries: ${model.num_queries}
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml b/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d5d3471d143ddfe999d8f3031e41ba6efce2e879
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml
@@ -0,0 +1,36 @@
+# @package data
+
+train_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollateMerge
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.train_mode}
+ small_crops: false
+ very_small_crops: false
+ scenes: 2
+ batch_instance: false
+ make_one_pc_noise: false
+ place_nearby: false
+ place_far: false
+ proba: 1
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
+
+validation_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.validation_mode}
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
+
+test_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.test_mode}
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader.yaml b/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..39996e14d769c2ba9341da582a1f7bf970fc7925
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader.yaml
@@ -0,0 +1,22 @@
+# @package data
+
+train_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: true
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.batch_size}
+
+validation_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.test_batch_size}
+
+test_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.test_batch_size}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml b/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b1b1b45d13167dc07357a13feb5a513dd71c9a2e
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml
@@ -0,0 +1,22 @@
+# @package data
+
+train_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: true
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.batch_size}
+
+validation_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: 1
+ batch_size: ${data.test_batch_size}
+
+test_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: 1
+ batch_size: ${data.test_batch_size}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6728ab9eb26bc78f435237d9d7d61800b900735d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport.yaml
@@ -0,0 +1,48 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/matterport
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/matterport
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport_scannet.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport_scannet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..df259ceaadfa68a90c2b8a60d7b74a958b30c79d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/matterport_scannet.yaml
@@ -0,0 +1,50 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir:
+ - data/processed/scannet
+ - data/processed/matterport
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/rio.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/rio.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1adfea36fea05b14a7fa95382677aee6144d1b4b
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/rio.yaml
@@ -0,0 +1,48 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/s3dis.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/s3dis.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2e1385416655514397d82737e1edc2d1a5997657
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/s3dis.yaml
@@ -0,0 +1,87 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: False
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..50f1c6c5998d8f3c6dae35ef508225dff4b0271f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet.yaml
@@ -0,0 +1,79 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: false
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ add_unlabeled_pc: false
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet200.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet200.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..730a6ab9f1965004ec9828d1e8b2429005bef6f2
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/scannet200.yaml
@@ -0,0 +1,79 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: false
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ add_unlabeled_pc: false
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/semantic_kitti.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/semantic_kitti.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9540ad610bd4a68d64369519d20e13009df9feda
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/semantic_kitti.yaml
@@ -0,0 +1,42 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.train_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+
+validation_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.validation_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: null
+
+test_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.test_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: null
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/datasets/stpls3d.yaml b/models/Mask3D/build/lib/mask3d/conf/data/datasets/stpls3d.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..913667d4123a7edead9d948358ae25cf9f7b4bb1
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/datasets/stpls3d.yaml
@@ -0,0 +1,95 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: False
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ reps_per_epoch: ${general.reps_per_epoch}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
+ is_elastic_distortion: true
+ color_drop: 0.0
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ on_crops: ${general.on_crops}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ on_crops: ${general.on_crops}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/indoor.yaml b/models/Mask3D/build/lib/mask3d/conf/data/indoor.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..868c37ccfe901f14396b68a38eac47b42cb3e812
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/indoor.yaml
@@ -0,0 +1,43 @@
+# @package _group_
+
+# these parameters are inherited by datasets, data_loaders and collators
+# but they might be overwritten
+
+# splits
+train_mode: train
+validation_mode: validation
+test_mode: validation # test # validation
+
+# dataset
+ignore_label: 255
+add_raw_coordinates: true # 3dim
+add_colors: true # 3dim
+add_normals: false # 3dim
+in_channels: 3 # in_channels = 3 * (add_normals + add_colors + add_raw_coordinates)
+num_labels: 20
+# num_labels: 41
+add_instance: ${general.add_instance}
+task: ${general.task}
+
+# data loader
+pin_memory: false
+num_workers: 4
+batch_size: 5
+test_batch_size: 1
+cache_data: false
+
+# collation
+voxel_size: 0.02
+
+reps_per_epoch: ${general.reps_per_epoch}
+
+cropping: false
+cropping_args:
+ min_points: 30000
+ aspect: 0.8
+ min_crop: 0.5
+ max_crop: 1.0
+
+crop_min_size: 20000
+crop_length: 6.0
+cropping_v1: true
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/conf/data/outdoor.yaml b/models/Mask3D/build/lib/mask3d/conf/data/outdoor.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a77474f62d1cfb53f130160f641c65cb81a62956
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/data/outdoor.yaml
@@ -0,0 +1,26 @@
+# @package _group_
+
+# these parameters are inherited by datasets, data_loaders and collators
+# but they might be overwritten
+
+# splits
+train_mode: train
+validation_mode: validation
+test_mode: validation
+
+# dataset
+ignore_label: 255
+add_distance: true # 1dim
+add_reflection: true # 1dim
+in_channels: 2 # in_channels = add_distance + add_reflection
+num_labels: 19
+add_instance: false
+
+# data loader
+pin_memory: true
+num_workers: 4
+batch_size: 18
+sweep: 1
+
+# collation
+voxel_size: 0.15
diff --git a/models/Mask3D/build/lib/mask3d/conf/logging/base.yaml b/models/Mask3D/build/lib/mask3d/conf/logging/base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3d700a101ddf3d1e2c1a3cdea08190afff762a5b
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/logging/base.yaml
@@ -0,0 +1,10 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.NeptuneLogger
+ project_name: ${general.workspace}/${general.project_name}
+ experiment_name: ${general.experiment_name}
+ offline_mode: false
+
+- _target_: pytorch_lightning.loggers.CSVLogger
+ save_dir: ${general.save_dir}
+ name: ${general.experiment_id}
+ version: ${general.version}
diff --git a/models/Mask3D/build/lib/mask3d/conf/logging/full.yaml b/models/Mask3D/build/lib/mask3d/conf/logging/full.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b434e94dc1f0889cf0829b5f89b8509717a3546c
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/logging/full.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.WandbLogger
+ project: ${general.project_name}
+ name: ${general.experiment_name}
+ save_dir: ${general.save_dir}
+ entity: "schult"
+ resume: "allow"
+ id: ${general.experiment_name}
diff --git a/models/Mask3D/build/lib/mask3d/conf/logging/minimal.yaml b/models/Mask3D/build/lib/mask3d/conf/logging/minimal.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b1c46e26fefedcec50d4fdc9fc77c187d60cf7b9
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/logging/minimal.yaml
@@ -0,0 +1,5 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.CSVLogger
+ save_dir: ${general.save_dir}
+ name: ${general.experiment_id}
+ version: ${general.version}
diff --git a/models/Mask3D/build/lib/mask3d/conf/logging/offline.yaml b/models/Mask3D/build/lib/mask3d/conf/logging/offline.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..914ad19142ca22c3778be709208323908460ebac
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/logging/offline.yaml
@@ -0,0 +1,10 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.TensorBoardLogger
+ name: ${general.experiment_id}
+ version: ${general.version}
+ save_dir: ${general.save_dir}
+
+- _target_: pytorch_lightning.loggers.CSVLogger
+ name: ${general.experiment_id}
+ version: ${general.version}
+ save_dir: ${general.save_dir}
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/conf/loss/cross_entropy.yaml b/models/Mask3D/build/lib/mask3d/conf/loss/cross_entropy.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c000f40ad2ab40605c244e38243a6e0cc7933768
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/loss/cross_entropy.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.nn.CrossEntropyLoss
+ignore_index: ${data.ignore_label}
diff --git a/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion.yaml b/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3c04ba49ce1823c2d6e923a03ae0514490d463e9
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+_target_: mask3d.models.criterion.SetCriterion
+num_classes: ${general.num_targets}
+eos_coef: 0.1
+losses:
+ - "labels"
+ - "masks"
+num_points: ${matcher.num_points}
+oversample_ratio: 3.0
+importance_sample_ratio: 0.75
+class_weights: -1
diff --git a/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion_custom_weights_1.yaml b/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion_custom_weights_1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1d2c308e081c1ffa61beb13308b27e6ff753f0f4
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/loss/set_criterion_custom_weights_1.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+_target_: mask3d.models.criterion.SetCriterion
+num_classes: ${general.num_targets}
+eos_coef: 0.1
+losses:
+ - "labels"
+ - "masks"
+num_points: ${matcher.num_points}
+oversample_ratio: 3.0
+importance_sample_ratio: 0.75
+class_weights: [1.0,1.5,10.0,1.0,1.0,1.0,1.0,1.0,10.0,10.0,1.0,10.0,1.0,1.0]
diff --git a/models/Mask3D/build/lib/mask3d/conf/matcher/hungarian_matcher.yaml b/models/Mask3D/build/lib/mask3d/conf/matcher/hungarian_matcher.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..47750b20906b6b40a131b702ba360e36ee4c8380
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/matcher/hungarian_matcher.yaml
@@ -0,0 +1,6 @@
+# @package _group_
+_target_: mask3d.models.matcher.HungarianMatcher
+cost_class: 2.
+cost_mask: 5.
+cost_dice: 2.
+num_points: -1
diff --git a/models/Mask3D/build/lib/mask3d/conf/metrics/miou.yaml b/models/Mask3D/build/lib/mask3d/conf/metrics/miou.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..68d1b61181d9615d7d6d7638261d119a4fc47074
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/metrics/miou.yaml
@@ -0,0 +1,4 @@
+# @package _group_
+_target_: mask3d.models.metrics.ConfusionMatrix
+num_classes: ${data.num_labels}
+ignore_label: ${data.ignore_label}
diff --git a/models/Mask3D/build/lib/mask3d/conf/model/mask3d.yaml b/models/Mask3D/build/lib/mask3d/conf/model/mask3d.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..95718d8710477650561e0ddd845688f50c868032
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/model/mask3d.yaml
@@ -0,0 +1,47 @@
+# @package _group_
+_target_: mask3d.models.Mask3D
+
+# transformer parameters
+hidden_dim: 128
+dim_feedforward: 1024
+num_queries: 100
+num_heads: 8
+num_decoders: 3
+dropout: 0.0
+pre_norm: false
+use_level_embed: false
+normalize_pos_enc: true
+positional_encoding_type: "fourier"
+gauss_scale: 1.0
+hlevels: [0,1,2,3]
+
+# queries
+non_parametric_queries: true
+random_query_both: false
+random_normal: false
+random_queries: false
+use_np_features: false
+
+# sampling
+sample_sizes: [200, 800, 3200, 12800, 51200]
+max_sample_size: false # change false means sampling activated
+
+shared_decoder: true
+num_classes: ${general.num_targets}
+train_on_segments: ${general.train_on_segments}
+scatter_type: "mean"
+
+voxel_size: ${data.voxel_size}
+
+config:
+ backbone:
+ _target_: mask3d.models.Res16UNet34C
+ config:
+ dialations: [ 1, 1, 1, 1 ]
+ conv1_kernel_size: 5
+ bn_momentum: 0.02
+ # depends on normals, color, raw_coordinates
+ # varies from 3 to 9
+ in_channels: ${data.in_channels}
+ out_channels: ${data.num_labels}
+ out_fpn: true
diff --git a/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw.yaml b/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4b4020d1ddd1444c94ea5bfbe1281c485fca587e
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.optim.AdamW
+lr: 0.0001
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw_lower.yaml b/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw_lower.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7e42f091a0d5dd03b66ab1dcec8b81d78a692af9
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/optimizer/adamw_lower.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.optim.AdamW
+lr: 0.005
diff --git a/models/Mask3D/build/lib/mask3d/conf/scheduler/exponentiallr.yaml b/models/Mask3D/build/lib/mask3d/conf/scheduler/exponentiallr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dc5224083670b286d75fda46304560dbcca3aecb
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/scheduler/exponentiallr.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.ExponentialLR
+ gamma: 0.99999
+ last_epoch: -1 # ${trainer.max_epochs}
+ # need to set to number because of tensorboard logger
+ # steps_per_epoch: -1
+
+pytorch_lightning_params:
+ interval: step
diff --git a/models/Mask3D/build/lib/mask3d/conf/scheduler/lambdalr.yaml b/models/Mask3D/build/lib/mask3d/conf/scheduler/lambdalr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b63f6f4333e98931ce22f1a38829de0ef51a3719
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/scheduler/lambdalr.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.StepLR
+ step_size: 99999
+
+pytorch_lightning_params:
+ interval: epoch
diff --git a/models/Mask3D/build/lib/mask3d/conf/scheduler/onecyclelr.yaml b/models/Mask3D/build/lib/mask3d/conf/scheduler/onecyclelr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c788877193d7366c21088cf9fefb77e4f62ef4d9
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/scheduler/onecyclelr.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.OneCycleLR
+ max_lr: ${optimizer.lr}
+ epochs: ${trainer.max_epochs}
+ # need to set to number because of tensorboard logger
+ steps_per_epoch: -1
+
+pytorch_lightning_params:
+ interval: step
diff --git a/models/Mask3D/build/lib/mask3d/conf/trainer/trainer.yaml b/models/Mask3D/build/lib/mask3d/conf/trainer/trainer.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f436300f9ca6bbbe96ca6c1b4c7e8eeffe35fabd
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/trainer/trainer.yaml
@@ -0,0 +1,7 @@
+# @package _group_
+deterministic: false
+max_epochs: 1000
+min_epochs: 1
+resume_from_checkpoint: null
+check_val_every_n_epoch: 50
+num_sanity_val_steps: -1
diff --git a/models/Mask3D/build/lib/mask3d/conf/trainer/trainer600.yaml b/models/Mask3D/build/lib/mask3d/conf/trainer/trainer600.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dc9f00295aafe3431d1c0e7ca50dbc29559ea134
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/conf/trainer/trainer600.yaml
@@ -0,0 +1,7 @@
+# @package _group_
+deterministic: false
+max_epochs: 601
+min_epochs: 1
+resume_from_checkpoint: null
+check_val_every_n_epoch: 50
+num_sanity_val_steps: 2
diff --git a/models/Mask3D/build/lib/mask3d/datasets/__init__.py b/models/Mask3D/build/lib/mask3d/datasets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/datasets/outdoor_semseg.py b/models/Mask3D/build/lib/mask3d/datasets/outdoor_semseg.py
new file mode 100644
index 0000000000000000000000000000000000000000..4592a6eda45c1a7626530eb19c42c267496749df
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/outdoor_semseg.py
@@ -0,0 +1,206 @@
+import logging
+from pathlib import Path
+from typing import List, Optional, Union, Tuple
+from random import random
+
+import numpy as np
+import volumentations as V
+import yaml
+from torch.utils.data import Dataset
+
+logger = logging.getLogger(__name__)
+
+
+class LidarDataset(Dataset):
+ def __init__(
+ self,
+ data_dir: Optional[
+ Union[str, Tuple[str]]
+ ] = "data/processed/semantic_kitti",
+ label_db_filepath: Optional[
+ str
+ ] = "./data/processed/semantic_kitti/label_database.yaml",
+ mode: Optional[str] = "train",
+ add_reflection: Optional[bool] = True,
+ add_distance: Optional[bool] = False,
+ add_instance: Optional[bool] = True,
+ num_labels: Optional[int] = -1,
+ data_percent: Optional[float] = 1.0,
+ ignore_label: Optional[Union[int, List[int]]] = 255,
+ volume_augmentations_path: Optional[str] = None,
+ sweep: Optional[int] = 1,
+ ):
+ self.mode = mode
+ self.data_dir = data_dir
+ if type(data_dir) == str:
+ self.data_dir = [self.data_dir]
+ self.ignore_label = ignore_label
+ self.add_instance = add_instance
+ self.add_distance = add_distance
+ self.add_reflection = add_reflection
+
+ # loading database files
+ self._data = []
+ for database_path in self.data_dir:
+ database_path = Path(database_path)
+ if not (database_path / f"{mode}_database.yaml").exists():
+ print(f"generate {database_path}/{mode}_database.yaml first")
+ exit()
+ self._data.extend(
+ self._load_yaml(database_path / f"{mode}_database.yaml")
+ )
+
+ labels = self._load_yaml(Path(label_db_filepath))
+ self._labels = self._select_correct_labels(labels, num_labels)
+
+ # augmentations
+ self.volume_augmentations = V.NoOp()
+ if volume_augmentations_path is not None:
+ self.volume_augmentations = V.load(
+ volume_augmentations_path, data_format="yaml"
+ )
+
+ # reformulating in sweeps
+ data = [[]]
+ last_scene = self._data[0]["scene"]
+ for x in self._data:
+ if x["scene"] == last_scene:
+ data[-1].append(x)
+ else:
+ last_scene = x["scene"]
+ data.append([x])
+ for i in range(len(data)):
+ data[i] = list(self.chunks(data[i], sweep))
+ self._data = [val for sublist in data for val in sublist]
+
+ if data_percent < 1.0:
+ self._data = self._data[: int(len(self._data) * data_percent)]
+
+ @staticmethod
+ def chunks(lst, n):
+ """Yield successive n-sized chunks from lst."""
+ for i in range(0, len(lst), n):
+ yield lst[i : i + n]
+
+ def __len__(self):
+ return len(self.data)
+
+ def __getitem__(self, idx: int):
+ points = []
+ for sweep in self.data[idx]:
+ points.append(np.load(sweep["filepath"]))
+ # rotate
+ points[-1][:, :3] = (
+ points[-1][:, :3] @ np.array(sweep["pose"])[:3, :3]
+ )
+ # translate
+ points[-1][:, :3] += np.array(sweep["pose"])[:3, 3]
+ points = np.vstack(points)
+
+ coordinates, features, labels = (
+ points[:, :3],
+ points[:, 3:-2],
+ points[:, -2:],
+ )
+
+ if not self.add_reflection:
+ features = np.ones(np.ones((len(coordinates), 1)))
+
+ if self.add_distance:
+ center_coordinate = coordinates.mean(0)
+ features = np.hstack(
+ (
+ features,
+ np.linalg.norm(coordinates - center_coordinate, axis=1)[
+ :, np.newaxis
+ ],
+ )
+ )
+
+ # volume and image augmentations for train
+ if "train" in self.mode:
+ coordinates -= coordinates.mean(0)
+ if 0.5 > random():
+ coordinates += (
+ np.random.uniform(coordinates.min(0), coordinates.max(0))
+ / 2
+ )
+ aug = self.volume_augmentations(
+ points=coordinates,
+ features=features,
+ labels=labels,
+ )
+ coordinates, features, labels = (
+ aug["points"],
+ aug["features"],
+ aug["labels"],
+ )
+
+ # prepare labels and map from 0 to 20(40)
+ labels = labels.astype(np.int32)
+ if labels.size > 0:
+ labels[:, 0] = self._remap_from_zero(labels[:, 0])
+ if not self.add_instance:
+ # taking only first column, which is segmentation label, not instance
+ labels = labels[:, 0].flatten()
+
+ return coordinates, features, labels
+
+ @property
+ def data(self):
+ """database file containing information about preproscessed dataset"""
+ return self._data
+
+ @property
+ def label_info(self):
+ """database file containing information labels used by dataset"""
+ return self._labels
+
+ @staticmethod
+ def _load_yaml(filepath):
+ with open(filepath) as f:
+ file = yaml.safe_load(f)
+ return file
+
+ def _select_correct_labels(self, labels, num_labels):
+ number_of_validation_labels = 0
+ number_of_all_labels = 0
+ for (
+ k,
+ v,
+ ) in labels.items():
+ number_of_all_labels += 1
+ if v["validation"]:
+ number_of_validation_labels += 1
+
+ if num_labels == number_of_all_labels:
+ return labels
+ elif num_labels == number_of_validation_labels:
+ valid_labels = dict()
+ for (
+ k,
+ v,
+ ) in labels.items():
+ if v["validation"]:
+ valid_labels.update({k: v})
+ return valid_labels
+ else:
+ msg = f"""not available number labels, select from:
+ {number_of_validation_labels}, {number_of_all_labels}"""
+ raise ValueError(msg)
+
+ def _remap_from_zero(self, labels):
+ labels[
+ ~np.isin(labels, list(self.label_info.keys()))
+ ] = self.ignore_label
+ # remap to the range from 0
+ for i, k in enumerate(self.label_info.keys()):
+ labels[labels == k] = i
+ return labels
+
+ def _remap_model_output(self, output):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(self.label_info.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/__init__.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f222dc27e73eedab1e1d82b14c1573ce632af7c
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py
@@ -0,0 +1,116 @@
+import re
+from pathlib import Path
+import numpy as np
+import pandas as pd
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+import os
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+from utils.point_cloud_utils import load_ply_with_normals
+
+from datasets.scannet200.scannet200_constants import (
+ VALID_CLASS_IDS_200,
+ SCANNET_COLOR_MAP_200,
+ CLASS_LABELS_200,
+)
+
+
+class ARKitScenesPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "/home/weders/scratch/scratch/scannetter/arkit/raw",
+ save_dir: str = "/home/weders/scratch/scratch/scannetter/arkit/raw",
+ modes: tuple = ('Validation', ),
+ n_jobs: int = 1,
+ git_repo: str = "./data/raw/scannet/ScanNet",
+ mesh_file: str="mesh_tsdf.ply",
+ scannet200: bool = False,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.scannet200 = scannet200
+ git_repo = Path(git_repo)
+ for mode in self.modes:
+ scenes = os.listdir(os.path.join(data_dir, mode))
+ scans_folder = "scans_test" if mode == "test" else "scans"
+ filepaths = []
+ for scene in scenes:
+ if os.path.exists(os.path.join(data_dir, mode, scene, mesh_file)):
+ filepaths.append(
+ self.data_dir
+ / mode
+ / scene
+ / mesh_file)
+ self.files[mode] = natsorted(filepaths)
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ scene = int(filepath.parent.name)
+ print(scene)
+ filebase = {
+ "filepath": filepath,
+ "scene": scene,
+ "sub_scene": scene,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+ # reading both files and checking that they are fitting
+ coords, features, _ = load_ply_with_normals(filepath)
+ file_len = len(coords)
+ filebase["file_len"] = file_len
+ points = np.hstack((coords, features))
+
+ print(features.shape)
+
+ points = np.concatenate((points, np.zeros((file_len, 4))), axis=1) # adding segment and label fake columns
+
+ processed_filepath = (
+ self.save_dir / mode / f"data_mask3d.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ return filebase
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ if not self.scannet200:
+ logger.add(self.save_dir / "fixed_bugs_in_labels.log")
+ found_wrong_labels = {
+ tuple([270, 0]): 50,
+ tuple([270, 2]): 50,
+ tuple([384, 0]): 149,
+ }
+ for scene, wrong_label in found_wrong_labels.items():
+ scene, sub_scene = scene
+ bug_file = (
+ self.save_dir / "train" / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ points = np.load(bug_file)
+ bug_mask = points[:, -1] != wrong_label
+ points = points[bug_mask]
+ np.save(bug_file, points)
+ logger.info(f"Fixed {bug_file}")
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ print(scene_match)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(ARKitScenesPreprocessing)
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/base_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/base_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..a17fd4f89aca0d16d27b1bd10c9f40b3e40a6e61
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/base_preprocessing.py
@@ -0,0 +1,204 @@
+import os
+import sys
+import re
+import yaml
+import json
+import multiprocessing
+from pathlib import Path
+from hashlib import md5
+
+import numpy as np
+from fire import Fire
+from tqdm import tqdm
+from joblib import Parallel, delayed
+from loguru import logger
+
+
+class BasePreprocessing:
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/",
+ save_dir: str = "./data/processed/",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ ):
+ self.data_dir = Path(data_dir)
+ self.save_dir = Path(save_dir)
+ self.n_jobs = n_jobs
+ self.modes = modes
+
+ if not self.data_dir.exists():
+ logger.error("data folder doesn't exist")
+ raise FileNotFoundError
+ if self.save_dir.exists() is False:
+ self.save_dir.mkdir(parents=True, exist_ok=True)
+
+ self.files = {}
+ for data_type in self.modes:
+ self.files.update({data_type: []})
+
+ @logger.catch
+ def preprocess(self):
+ self.n_jobs = (
+ multiprocessing.cpu_count() if self.n_jobs == -1 else self.n_jobs
+ )
+ for mode in self.modes:
+ database = []
+ logger.info(f"Tasks for {mode}: {len(self.files[mode])}")
+ parallel_results = Parallel(n_jobs=self.n_jobs, verbose=10)(
+ delayed(self.process_file)(file, mode)
+ for file in self.files[mode]
+ )
+ for filebase in parallel_results:
+ database.append(filebase)
+ self.save_database(database, mode)
+ # self.fix_bugs_in_labels()
+ # self.joint_database()
+ # self.compute_color_mean_std(
+ # train_database_path=(self.save_dir / "train_database.yaml")
+ # )
+
+ def preprocess_sequential(self):
+ for mode in self.modes:
+ database = []
+ for filepath in tqdm(self.files[mode], unit="file"):
+ filebase = self.process_file(filepath, mode)
+ database.append(filebase)
+ self.save_database(database, mode)
+ self.fix_bugs_in_labels()
+ self.joint_database()
+ self.compute_color_mean_std(
+ train_database_path=(self.save_dir / "train_database.yaml")
+ )
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Args:
+ filepath: path to the main file
+ mode: typically train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ raise NotImplementedError
+
+ def make_instance_database_sequential(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ mode="instance",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ instance_database = []
+ for sample in tqdm(train_database):
+ instance_database.append(self.extract_instance_from_file(sample))
+ self.save_database(instance_database, mode=mode)
+
+ @logger.catch
+ def make_instance_database(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ mode="instance",
+ ):
+ self.n_jobs = (
+ multiprocessing.cpu_count() if self.n_jobs == -1 else self.n_jobs
+ )
+ train_database = self._load_yaml(train_database_path)
+ instance_database = []
+ logger.info(f"Files in database: {len(train_database)}")
+ parallel_results = Parallel(n_jobs=self.n_jobs, verbose=10)(
+ delayed(self.extract_instance_from_file)(sample)
+ for sample in train_database
+ )
+ for filebase in parallel_results:
+ instance_database.append(filebase)
+ self.save_database(instance_database, mode=mode)
+
+ def extract_instance_from_file(self, sample_from_database):
+ points = np.load(sample_from_database["filepath"])
+ labels = points[:, -2:]
+ file_instances = []
+ for instance_id in np.unique(labels[:, 1]):
+ occupied_indices = np.isin(labels[:, 1], instance_id)
+ instance_points = points[occupied_indices].copy()
+ instance_classes = (
+ np.unique(instance_points[:, 9]).astype(int).tolist()
+ )
+
+ hash_string = str(sample_from_database["filepath"]) + str(
+ instance_id
+ )
+ hash_string = md5(hash_string.encode("utf-8")).hexdigest()
+ instance_filepath = (
+ self.save_dir / "instances" / f"{hash_string}.npy"
+ )
+ instance = {
+ "classes": instance_classes,
+ "instance_filepath": str(instance_filepath),
+ "instance_size": len(instance_points),
+ "original_file": str(sample_from_database["filepath"]),
+ }
+ if not instance_filepath.parent.exists():
+ instance_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(instance_filepath, instance_points.astype(np.float32))
+ file_instances.append(instance)
+ return file_instances
+
+ def fix_bugs_in_labels(self):
+ pass
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ ):
+ pass
+
+ def save_database(self, database, mode):
+ for element in database:
+ self._dict_to_yaml(element)
+ self._save_yaml(self.save_dir / (mode + "_database.yaml"), database)
+
+ def joint_database(self, train_modes=["train", "validation"]):
+ joint_db = []
+ for mode in train_modes:
+ joint_db.extend(
+ self._load_yaml(self.save_dir / (mode + "_database.yaml"))
+ )
+ self._save_yaml(
+ self.save_dir / "train_validation_database.yaml", joint_db
+ )
+
+ @classmethod
+ def _read_json(cls, path):
+ with open(path) as f:
+ file = json.load(f)
+ return file
+
+ @classmethod
+ def _save_yaml(cls, path, file):
+ with open(path, "w") as f:
+ yaml.safe_dump(
+ file, f, default_style=None, default_flow_style=False
+ )
+
+ @classmethod
+ def _dict_to_yaml(cls, dictionary):
+ if not isinstance(dictionary, dict):
+ return
+ for k, v in dictionary.items():
+ if isinstance(v, dict):
+ cls._dict_to_yaml(v)
+ if isinstance(v, np.ndarray):
+ dictionary[k] = v.tolist()
+ if isinstance(v, Path):
+ dictionary[k] = str(v)
+
+ @classmethod
+ def _load_yaml(cls, filepath):
+ with open(filepath) as f:
+ file = yaml.safe_load(f)
+ return file
+
+
+if __name__ == "__main__":
+ Fire(BasePreprocessing)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/s3dis_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/s3dis_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e7ff4967ca9dc22248c6863b41f7b652687ae98
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/s3dis_preprocessing.py
@@ -0,0 +1,282 @@
+import os
+import re
+
+import numpy as np
+from fire import Fire
+from loguru import logger
+from natsort import natsorted
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+
+
+class S3DISPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/s3dis",
+ save_dir: str = "./data/processed/s3dis",
+ modes: tuple = (
+ "Area_1",
+ "Area_2",
+ "Area_3",
+ "Area_4",
+ "Area_5",
+ "Area_6",
+ ),
+ n_jobs: int = -1,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.class_map = {
+ "ceiling": 0,
+ "floor": 1,
+ "wall": 2,
+ "beam": 3,
+ "column": 4,
+ "window": 5,
+ "door": 6,
+ "table": 7,
+ "chair": 8,
+ "sofa": 9,
+ "bookcase": 10,
+ "board": 11,
+ "clutter": 12,
+ "stairs": 12, # stairs are also mapped to clutter
+ }
+
+ self.color_map = [
+ [0, 255, 0], # ceiling
+ [0, 0, 255], # floor
+ [0, 255, 255], # wall
+ [255, 255, 0], # beam
+ [255, 0, 255], # column
+ [100, 100, 255], # window
+ [200, 200, 100], # door
+ [170, 120, 200], # table
+ [255, 0, 0], # chair
+ [200, 100, 100], # sofa
+ [10, 200, 100], # bookcase
+ [200, 200, 200], # board
+ [50, 50, 50],
+ ] # clutter
+
+ self.create_label_database()
+
+ for mode in self.modes:
+ filepaths = []
+ for scene_path in [
+ f.path for f in os.scandir(self.data_dir / mode) if f.is_dir()
+ ]:
+ filepaths.append(scene_path)
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self):
+ label_database = dict()
+ for class_name, class_id in self.class_map.items():
+ label_database[class_id] = {
+ "color": self.color_map[class_id],
+ "name": class_name,
+ "validation": True,
+ }
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def _buf_count_newlines_gen(self, fname):
+ def _make_gen(reader):
+ while True:
+ b = reader(2**16)
+ if not b:
+ break
+ yield b
+
+ with open(fname, "rb") as f:
+ count = sum(buf.count(b"\n") for buf in _make_gen(f.raw.read))
+ return count
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ filebase = {
+ "filepath": filepath,
+ "scene": filepath.split("/")[-1],
+ "area": mode,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+
+ scene_name = filepath.split("/")[-1]
+ instance_counter = 0
+ scene_points = []
+ for instance in [
+ f
+ for f in os.scandir(
+ self.data_dir / mode / scene_name / "Annotations"
+ )
+ if f.name.endswith(".txt")
+ ]:
+ instance_class = self.class_map[instance.name.split("_")[0]]
+ instance_points = np.loadtxt(instance.path)
+
+ instance_normals = np.ones((instance_points.shape[0], 3))
+ instance_class = np.array(instance_class).repeat(
+ instance_points.shape[0]
+ )[..., None]
+ instance_id = np.array(instance_counter).repeat(
+ instance_points.shape[0]
+ )[..., None]
+
+ instance_points = np.hstack(
+ (
+ instance_points,
+ instance_normals,
+ instance_class,
+ instance_id,
+ )
+ )
+
+ scene_points.append(instance_points)
+ instance_counter += 1
+
+ points = np.vstack(scene_points)
+
+ pcd_size = self._buf_count_newlines_gen(f"{filepath}/{scene_name}.txt")
+ if points.shape[0] != pcd_size:
+ print(f"FILE SIZE DOES NOT MATCH FOR {filepath}/{scene_name}.txt")
+ print(f"({points.shape[0]} vs. {pcd_size})")
+
+ filebase["raw_segmentation_filepath"] = ""
+
+ # add segment id as additional feature (DUMMY)
+ points = np.hstack((points, np.ones(points.shape[0])[..., None]))
+ points[:, [9, 10, -1]] = points[
+ :, [-1, 9, 10]
+ ] # move segments after RGB
+
+ gt_data = (points[:, -2] + 1) * 1000 + points[:, -1] + 1
+
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ processed_filepath = self.save_dir / mode / f"{scene_name}.npy"
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ processed_gt_filepath = (
+ self.save_dir / "instance_gt" / mode / f"{scene_name}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.savetxt(processed_gt_filepath, gt_data.astype(np.int32), fmt="%d")
+ filebase["instance_gt_filepath"] = str(processed_gt_filepath)
+
+ filebase["color_mean"] = [
+ float((points[:, 3] / 255).mean()),
+ float((points[:, 4] / 255).mean()),
+ float((points[:, 5] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((points[:, 3] / 255) ** 2).mean()),
+ float(((points[:, 4] / 255) ** 2).mean()),
+ float(((points[:, 5] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(self, train_database_path: str = ""):
+ area_database_paths = [
+ f
+ for f in os.scandir(self.save_dir)
+ if f.name.startswith("Area_") and f.name.endswith(".yaml")
+ ]
+
+ for database_path in area_database_paths:
+ database = self._load_yaml(database_path.path)
+ color_mean, color_std = [], []
+ for sample in database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(
+ np.array(color_std).mean(axis=0) - color_mean**2
+ )
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(
+ self.save_dir / f"{database_path.name}_color_mean_std.yaml",
+ feats_mean_std,
+ )
+
+ for database_path in area_database_paths:
+ all_mean, all_std = [], []
+ for let_out_path in area_database_paths:
+ if database_path == let_out_path:
+ continue
+
+ database = self._load_yaml(let_out_path.path)
+ for sample in database:
+ all_std.append(sample["color_std"])
+ all_mean.append(sample["color_mean"])
+
+ all_color_mean = np.array(all_mean).mean(axis=0)
+ all_color_std = np.sqrt(
+ np.array(all_std).mean(axis=0) - all_color_mean**2
+ )
+ feats_mean_std = {
+ "mean": [float(each) for each in all_color_mean],
+ "std": [float(each) for each in all_color_std],
+ }
+ file_path = database_path.name.replace("_database.yaml", "")
+ self._save_yaml(
+ self.save_dir / f"{file_path}_color_mean_std.yaml",
+ feats_mean_std,
+ )
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ pass
+
+ def joint_database(
+ self,
+ train_modes=(
+ "Area_1",
+ "Area_2",
+ "Area_3",
+ "Area_4",
+ "Area_5",
+ "Area_6",
+ ),
+ ):
+ for mode in train_modes:
+ joint_db = []
+ for let_out in train_modes:
+ if mode == let_out:
+ continue
+ joint_db.extend(
+ self._load_yaml(
+ self.save_dir / (let_out + "_database.yaml")
+ )
+ )
+ self._save_yaml(
+ self.save_dir / f"train_{mode}_database.yaml", joint_db
+ )
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(S3DISPreprocessing)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/scannet_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/scannet_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a981864612e04930b04c9c0df8aaa6e2d9249a3
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/scannet_preprocessing.py
@@ -0,0 +1,296 @@
+import re
+from pathlib import Path
+import numpy as np
+import pandas as pd
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+from utils.point_cloud_utils import load_ply_with_normals
+
+from datasets.scannet200.scannet200_constants import (
+ VALID_CLASS_IDS_200,
+ SCANNET_COLOR_MAP_200,
+ CLASS_LABELS_200,
+)
+
+
+class ScannetPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/scannet/scannet",
+ save_dir: str = "./data/processed/scannet",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ git_repo: str = "./data/raw/scannet/ScanNet",
+ scannet200: bool = False,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.scannet200 = scannet200
+
+ if self.scannet200:
+ self.labels_pd = pd.read_csv(
+ self.data_dir / "scannetv2-labels.combined.tsv",
+ sep="\t",
+ header=0,
+ )
+
+ git_repo = Path(git_repo)
+ self.create_label_database(git_repo)
+ for mode in self.modes:
+ trainval_split_dir = git_repo / "Tasks" / "Benchmark"
+ scannet_special_mode = "val" if mode == "validation" else mode
+ with open(
+ trainval_split_dir / (f"scannetv2_{scannet_special_mode}.txt")
+ ) as f:
+ # -1 because the last one is always empty
+ split_file = f.read().split("\n")[:-1]
+
+ scans_folder = "scans_test" if mode == "test" else "scans"
+ filepaths = []
+ for scene in split_file:
+ filepaths.append(
+ self.data_dir
+ / scans_folder
+ / scene
+ / (scene + "_vh_clean_2.ply")
+ )
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self, git_repo):
+ if self.scannet200:
+ label_database = {}
+ for row_id, class_id in enumerate(VALID_CLASS_IDS_200):
+ label_database[class_id] = {
+ "color": SCANNET_COLOR_MAP_200[class_id],
+ "name": CLASS_LABELS_200[row_id],
+ "validation": True,
+ }
+ self._save_yaml(
+ self.save_dir / "label_database.yaml", label_database
+ )
+ return label_database
+ else:
+ if (self.save_dir / "label_database.yaml").exists():
+ return self._load_yaml(self.save_dir / "label_database.yaml")
+ df = pd.read_csv(
+ self.data_dir / "scannetv2-labels.combined.tsv", sep="\t"
+ )
+ df = (
+ df[~df[["nyu40class", "nyu40id"]].duplicated()][
+ ["nyu40class", "nyu40id"]
+ ]
+ .set_index("nyu40id")
+ .sort_index()[["nyu40class"]]
+ .rename(columns={"nyu40class": "name"})
+ .replace(" ", "_", regex=True)
+ )
+ df = pd.DataFrame([{"name": "empty"}]).append(df)
+ df["validation"] = False
+
+ with open(
+ git_repo
+ / "Tasks"
+ / "Benchmark"
+ / "classes_SemVoxLabel-nyu40id.txt"
+ ) as f:
+ for_validation = f.read().split("\n")
+ for category in for_validation:
+ index = int(re.split(" +", category)[0])
+ df.loc[index, "validation"] = True
+
+ # doing this hack because otherwise I will have to install imageio
+ with open(git_repo / "BenchmarkScripts" / "util.py") as f:
+ util = f.read()
+ color_list = eval("[" + util.split("return [\n")[1])
+
+ df["color"] = color_list
+
+ label_database = df.to_dict("index")
+ self._save_yaml(
+ self.save_dir / "label_database.yaml", label_database
+ )
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ scene, sub_scene = self._parse_scene_subscene(filepath.name)
+ filebase = {
+ "filepath": filepath,
+ "scene": scene,
+ "sub_scene": sub_scene,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+ # reading both files and checking that they are fitting
+ coords, features, _ = load_ply_with_normals(filepath)
+ file_len = len(coords)
+ filebase["file_len"] = file_len
+ points = np.hstack((coords, features))
+
+ if mode in ["train", "validation"]:
+ # getting scene information
+ description_filepath = Path(
+ filepath
+ ).parent / filepath.name.replace("_vh_clean_2.ply", ".txt")
+ with open(description_filepath) as f:
+ scene_type = f.read().split("\n")[:-1]
+ scene_type = scene_type[-1].split(" = ")[1]
+ filebase["scene_type"] = scene_type
+ filebase["raw_description_filepath"] = description_filepath
+
+ # getting instance info
+ instance_info_filepath = next(
+ Path(filepath).parent.glob("*.aggregation.json")
+ )
+ segment_indexes_filepath = next(
+ Path(filepath).parent.glob("*[0-9].segs.json")
+ )
+ instance_db = self._read_json(instance_info_filepath)
+ segments = self._read_json(segment_indexes_filepath)
+ segments = np.array(segments["segIndices"])
+ filebase["raw_instance_filepath"] = instance_info_filepath
+ filebase["raw_segmentation_filepath"] = segment_indexes_filepath
+
+ # add segment id as additional feature
+ segment_ids = np.unique(segments, return_inverse=True)[1]
+ points = np.hstack((points, segment_ids[..., None]))
+
+ # reading labels file
+ label_filepath = filepath.parent / filepath.name.replace(
+ ".ply", ".labels.ply"
+ )
+ filebase["raw_label_filepath"] = label_filepath
+ label_coords, label_colors, labels = load_ply_with_normals(
+ label_filepath
+ )
+ if not np.allclose(coords, label_coords):
+ raise ValueError("files doesn't have same coordinates")
+
+ # adding instance label
+ labels = labels[:, np.newaxis]
+ empty_instance_label = np.full(labels.shape, -1)
+ labels = np.hstack((labels, empty_instance_label))
+ for instance in instance_db["segGroups"]:
+ segments_occupied = np.array(instance["segments"])
+ occupied_indices = np.isin(segments, segments_occupied)
+ labels[occupied_indices, 1] = instance["id"]
+
+ if self.scannet200:
+ label200 = instance["label"]
+ # Map the category name to id
+ label_ids = self.labels_pd[
+ self.labels_pd["raw_category"] == label200
+ ]["id"]
+ label_id = (
+ int(label_ids.iloc[0]) if len(label_ids) > 0 else 0
+ )
+ labels[occupied_indices, 0] = label_id
+ points = np.hstack((points, labels))
+
+ # gt_data = (points[:, -2] + 1) * 1000 + points[:, -1] + 1
+ gt_data = points[:, -2] * 1000 + points[:, -1] + 1
+ else:
+ segments_test = "../../data/raw/scannet_test_segments"
+ segment_indexes_filepath = filepath.name.replace(
+ ".ply", ".0.010000.segs.json"
+ )
+ segments = self._read_json(
+ f"{segments_test}/{segment_indexes_filepath}"
+ )
+ segments = np.array(segments["segIndices"])
+ # add segment id as additional feature
+ segment_ids = np.unique(segments, return_inverse=True)[1]
+ points = np.hstack((points, segment_ids[..., None]))
+
+ processed_filepath = (
+ self.save_dir / mode / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ if mode == "test":
+ return filebase
+
+ processed_gt_filepath = (
+ self.save_dir
+ / "instance_gt"
+ / mode
+ / f"scene{scene:04}_{sub_scene:02}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.savetxt(processed_gt_filepath, gt_data.astype(np.int32), fmt="%d")
+ filebase["instance_gt_filepath"] = str(processed_gt_filepath)
+
+ filebase["color_mean"] = [
+ float((features[:, 0] / 255).mean()),
+ float((features[:, 1] / 255).mean()),
+ float((features[:, 2] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((features[:, 0] / 255) ** 2).mean()),
+ float(((features[:, 1] / 255) ** 2).mean()),
+ float(((features[:, 2] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/scannet/train_database.yaml",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ color_mean, color_std = [], []
+ for sample in train_database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(np.array(color_std).mean(axis=0) - color_mean**2)
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(self.save_dir / "color_mean_std.yaml", feats_mean_std)
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ if not self.scannet200:
+ logger.add(self.save_dir / "fixed_bugs_in_labels.log")
+ found_wrong_labels = {
+ tuple([270, 0]): 50,
+ tuple([270, 2]): 50,
+ tuple([384, 0]): 149,
+ }
+ for scene, wrong_label in found_wrong_labels.items():
+ scene, sub_scene = scene
+ bug_file = (
+ self.save_dir / "train" / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ points = np.load(bug_file)
+ bug_mask = points[:, -1] != wrong_label
+ points = points[bug_mask]
+ np.save(bug_file, points)
+ logger.info(f"Fixed {bug_file}")
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(ScannetPreprocessing)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..d483e535435cca026588c3177cfe368fad99596b
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py
@@ -0,0 +1,181 @@
+import re
+from pathlib import Path
+from hashlib import md5
+from natsort import natsorted
+
+import numpy as np
+from fire import Fire
+
+from base_preprocessing import BasePreprocessing
+
+
+class SemanticKittiPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/semantic_kitti",
+ save_dir: str = "./data/processed/semantic_kitti",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ git_repo: str = "./data/raw/semantic-kitti-api",
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ git_repo = Path(git_repo)
+ self.create_label_database(git_repo / "config" / "semantic-kitti.yaml")
+ self.config = self._load_yaml(
+ git_repo / "config" / "semantic-kitti.yaml"
+ )
+ self.pose = dict()
+
+ for mode in self.modes:
+ scene_mode = "valid" if mode == "validation" else mode
+ self.pose[mode] = dict()
+ for scene in sorted(self.config["split"][scene_mode]):
+ filepaths = list(
+ self.data_dir.glob(f"*/{scene:02}/velodyne/*bin")
+ )
+ filepaths = [str(file) for file in filepaths]
+ self.files[mode].extend(natsorted(filepaths))
+ calibration = parse_calibration(
+ Path(filepaths[0]).parent.parent / "calib.txt"
+ )
+ self.pose[mode].update(
+ {
+ scene: parse_poses(
+ Path(filepaths[0]).parent.parent / "poses.txt",
+ calibration,
+ ),
+ }
+ )
+
+ def create_label_database(self, config_file):
+ if (self.save_dir / "label_database.yaml").exists():
+ return self._load_yaml(self.save_dir / "label_database.yaml")
+ config = self._load_yaml(config_file)
+ label_database = {}
+ for key, old_key in config["learning_map_inv"].items():
+ label_database.update(
+ {
+ key: {
+ "name": config["labels"][old_key],
+ # bgr -> rgb
+ "color": config["color_map"][old_key][::-1],
+ "validation": not config["learning_ignore"][key],
+ }
+ }
+ )
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test
+
+ Returns:
+ filebase: info about file
+ """
+ scene, sub_scene = re.search(r"(\d{2}).*(\d{6})", filepath).group(1, 2)
+ filebase = {
+ "filepath": filepath,
+ "scene": int(scene),
+ "sub_scene": int(sub_scene),
+ "file_len": -1,
+ "pose": self.pose[mode][int(scene)][int(sub_scene)].tolist(),
+ }
+
+ points = np.fromfile(filepath, dtype=np.float32).reshape(-1, 4)
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ if mode in ["train", "validation"]:
+ # getting label info
+ label_filepath = filepath.replace("velodyne", "labels").replace(
+ "bin", "label"
+ )
+ filebase["label_filepath"] = label_filepath
+ label = np.fromfile(label_filepath, dtype=np.uint32).astype(
+ np.int32
+ )
+ if not points.shape[0] == label.shape[0]:
+ raise ValueError("Files do not have same length")
+ semantic_label = label & 0xFFFF
+ instance_label = label >> 16
+
+ semantic_label_copy = semantic_label.copy()
+ for label in np.unique(semantic_label):
+ semantic_label[semantic_label_copy == label] = self.config[
+ "learning_map"
+ ][label]
+
+ label = np.hstack(
+ (semantic_label[:, np.newaxis], instance_label[:, np.newaxis])
+ )
+ points = np.hstack((points, label))
+
+ processed_filepath = self.save_dir / mode / f"{scene}_{sub_scene}.npy"
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ return filebase
+
+
+def parse_calibration(filename):
+ """read calibration file with given filename
+ Returns
+ -------
+ dict
+ Calibration matrices as 4x4 numpy arrays.
+ """
+ calib = {}
+
+ with open(filename) as calib_file:
+ for line in calib_file:
+ key, content = line.strip().split(":")
+ values = [float(v) for v in content.strip().split()]
+
+ pose = np.zeros((4, 4))
+ pose[0, 0:4] = values[0:4]
+ pose[1, 0:4] = values[4:8]
+ pose[2, 0:4] = values[8:12]
+ pose[3, 3] = 1.0
+
+ calib[key] = pose
+ return calib
+
+
+def parse_poses(filename, calibration):
+ """read poses file with per-scan poses from given filename
+ Returns
+ -------
+ list
+ list of poses as 4x4 numpy arrays.
+ """
+
+ poses = []
+
+ Tr = calibration["Tr"]
+ Tr_inv = np.linalg.inv(Tr)
+
+ with open(filename) as file:
+ for line in file:
+ values = [float(v) for v in line.strip().split()]
+
+ pose = np.zeros((4, 4))
+ pose[0, 0:4] = values[0:4]
+ pose[1, 0:4] = values[4:8]
+ pose[2, 0:4] = values[8:12]
+ pose[3, 3] = 1.0
+
+ poses.append(np.matmul(Tr_inv, np.matmul(pose, Tr)))
+
+ return poses
+
+
+if __name__ == "__main__":
+ Fire(SemanticKittiPreprocessing)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/preprocessing/stpls3d_preprocessing.py b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/stpls3d_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..63ed5bff5d52e656f4bad2f853e5973b433871bd
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/preprocessing/stpls3d_preprocessing.py
@@ -0,0 +1,291 @@
+import re
+import os
+import numpy as np
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+import pandas as pd
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+
+
+class STPLS3DPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "../../data/raw/stpls3d",
+ save_dir: str = "../../data/processed/stpls3d",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ # https://github.com/meidachen/STPLS3D/blob/main/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L31
+ CLASS_LABELS = [
+ "Build",
+ "LowVeg",
+ "MediumVeg",
+ "HighVeg",
+ "Vehicle",
+ "Truck",
+ "Aircraft",
+ "MilitaryVeh",
+ "Bike",
+ "Motorcycle",
+ "LightPole",
+ "StreetSign",
+ "Clutter",
+ "Fence",
+ ]
+ VALID_CLASS_IDS = np.array(
+ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+ )
+
+ self.class_map = {
+ "Ground": 0,
+ "Build": 1,
+ "LowVeg": 2,
+ "MediumVeg": 3,
+ "HighVeg": 4,
+ "Vehicle": 5,
+ "Truck": 6,
+ "Aircraft": 7,
+ "MilitaryVeh": 8,
+ "Bike": 9,
+ "Motorcycle": 10,
+ "LightPole": 11,
+ "StreetSign": 12,
+ "Clutter": 13,
+ "Fence": 14,
+ }
+
+ self.color_map = [
+ [0, 255, 0], # Ground
+ [0, 0, 255], # Build
+ [0, 255, 255], # LowVeg
+ [255, 255, 0], # MediumVeg
+ [255, 0, 255], # HiVeg
+ [100, 100, 255], # Vehicle
+ [200, 200, 100], # Truck
+ [170, 120, 200], # Aircraft
+ [255, 0, 0], # MilitaryVec
+ [200, 100, 100], # Bike
+ [10, 200, 100], # Motorcycle
+ [200, 200, 200], # LightPole
+ [50, 50, 50], # StreetSign
+ [60, 130, 60], # Clutter
+ [130, 30, 60],
+ ] # Fence
+
+ self.create_label_database()
+
+ for mode in self.modes:
+ filepaths = []
+ for scene_path in [
+ f.path for f in os.scandir(self.data_dir / mode)
+ ]:
+ filepaths.append(scene_path)
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self):
+ label_database = dict()
+ for class_name, class_id in self.class_map.items():
+ label_database[class_id] = {
+ "color": self.color_map[class_id],
+ "name": class_name,
+ "validation": True,
+ }
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ filebase = {
+ "filepath": filepath,
+ "scene": filepath.split("/")[-1],
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+
+ points = pd.read_csv(filepath, header=None).values
+
+ filebase["raw_segmentation_filepath"] = ""
+
+ # add segment id as additional feature (DUMMY)
+ if mode in ["train", "validation"]:
+ points = np.hstack(
+ (
+ points,
+ np.ones(points.shape[0])[..., None], # normal 1
+ np.ones(points.shape[0])[..., None], # normal 2
+ np.ones(points.shape[0])[..., None], # normal 3
+ np.ones(points.shape[0])[..., None],
+ )
+ ) # segments
+ else:
+ # we need to add dummies for semantics and instances
+ points = np.hstack(
+ (
+ points,
+ np.ones(points.shape[0])[..., None], # semantic class
+ np.ones(points.shape[0])[..., None], # instance id
+ np.ones(points.shape[0])[..., None], # normal 1
+ np.ones(points.shape[0])[..., None], # normal 2
+ np.ones(points.shape[0])[..., None], # normal 3
+ np.ones(points.shape[0])[..., None],
+ )
+ ) # segments
+
+ points = points[
+ :, [0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 6, 7]
+ ] # move segments after RGB
+
+ # move point clouds to be in positive range (important for split pointcloud function)
+ points[:, :3] = points[:, :3] - points[:, :3].min(0)
+
+ points = points.astype(np.float32)
+
+ if mode == "test":
+ points = points[:, :-2]
+ else:
+ points[
+ points[:, -1] == -100.0, -1
+ ] = -1 # -1 indicates "no instance"
+
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ processed_filepath = (
+ self.save_dir
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ if mode in ["validation", "test"]:
+ blocks = self.splitPointCloud(points)
+
+ filebase["instance_gt_filepath"] = []
+ filebase["filepath_crop"] = []
+ for block_id, block in enumerate(blocks):
+ if len(block) > 10000:
+ if mode == "validation":
+ new_instance_ids = np.unique(
+ block[:, -1], return_inverse=True
+ )[1]
+
+ assert new_instance_ids.shape[0] == block.shape[0]
+ # == 0 means -1 == no instance
+ # new_instance_ids[new_instance_ids == 0]
+ assert (
+ new_instance_ids.max() < 1000
+ ), "we cannot encode when there are more than 999 instances in a block"
+
+ gt_data = (block[:, -2]) * 1000 + new_instance_ids
+
+ processed_gt_filepath = (
+ self.save_dir
+ / "instance_gt"
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}_{block_id}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(
+ parents=True, exist_ok=True
+ )
+ np.savetxt(
+ processed_gt_filepath,
+ gt_data.astype(np.int32),
+ fmt="%d",
+ )
+ filebase["instance_gt_filepath"].append(
+ str(processed_gt_filepath)
+ )
+
+ processed_filepath = (
+ self.save_dir
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}_{block_id}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(
+ parents=True, exist_ok=True
+ )
+ np.save(processed_filepath, block.astype(np.float32))
+ filebase["filepath_crop"].append(str(processed_filepath))
+ else:
+ print("block was smaller than 1000 points")
+ assert False
+
+ filebase["color_mean"] = [
+ float((points[:, 3] / 255).mean()),
+ float((points[:, 4] / 255).mean()),
+ float((points[:, 5] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((points[:, 3] / 255) ** 2).mean()),
+ float(((points[:, 4] / 255) ** 2).mean()),
+ float(((points[:, 5] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/stpls3d/train_database.yaml",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ color_mean, color_std = [], []
+ for sample in train_database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(np.array(color_std).mean(axis=0) - color_mean**2)
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(self.save_dir / "color_mean_std.yaml", feats_mean_std)
+
+ def splitPointCloud(self, cloud, size=50.0, stride=50):
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - size) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - size) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks = []
+ for (x, y) in cells:
+ xcond = (cloud[:, 0] <= x + size) & (cloud[:, 0] >= x)
+ ycond = (cloud[:, 1] <= y + size) & (cloud[:, 1] >= y)
+ cond = xcond & ycond
+ block = cloud[cond, :]
+ blocks.append(block)
+ return blocks
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ pass
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(STPLS3DPreprocessing)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/random_cuboid.py b/models/Mask3D/build/lib/mask3d/datasets/random_cuboid.py
new file mode 100644
index 0000000000000000000000000000000000000000..334b87ecadbd9cbee2979d462532fb4a479b280f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/random_cuboid.py
@@ -0,0 +1,96 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import numpy as np
+import torch
+
+
+def check_aspect(crop_range, aspect_min):
+ xy_aspect = np.min(crop_range[:2]) / np.max(crop_range[:2])
+ xz_aspect = np.min(crop_range[[0, 2]]) / np.max(crop_range[[0, 2]])
+ yz_aspect = np.min(crop_range[1:]) / np.max(crop_range[1:])
+ return (
+ (xy_aspect >= aspect_min)
+ or (xz_aspect >= aspect_min)
+ or (yz_aspect >= aspect_min)
+ )
+
+
+class RandomCuboid(object):
+ """
+ RandomCuboid augmentation from DepthContrast [https://arxiv.org/abs/2101.02691]
+ We slightly modify this operation to account for object detection.
+ This augmentation randomly crops a cuboid from the input and
+ ensures that the cropped cuboid contains at least one bounding box
+ """
+
+ def __init__(
+ self,
+ min_points,
+ # aspect=0.8,
+ crop_length=6.0,
+ version1=True,
+ ):
+ # self.aspect = aspect
+ self.crop_length = crop_length
+ self.min_points = min_points
+ self.version1 = version1
+
+ def __call__(self, point_cloud):
+ if point_cloud.shape[0] < self.min_points:
+ print("too small pcd")
+ return np.ones(point_cloud.shape[0], dtype=np.bool)
+
+ range_xyz = np.max(point_cloud[:, :2], axis=0) - np.min(
+ point_cloud[:, :2], axis=0
+ )
+
+ for _ in range(100):
+ # crop_range = self.min_crop + np.random.rand(3) * (
+ # self.max_crop - self.min_crop
+ # )
+ # crop_range[-1] = 999.
+ # if not check_aspect(crop_range, self.aspect):
+ # continue
+
+ sample_center = point_cloud[:, :2].min(axis=0) + range_xyz / 2
+
+ if self.version1:
+ offset_x = np.random.uniform(
+ -range_xyz[0] / 4, range_xyz[0] / 4
+ )
+ offset_y = np.random.uniform(
+ -range_xyz[1] / 4, range_xyz[1] / 4
+ )
+ else:
+ offset_x = np.random.uniform(
+ -(range_xyz[0] / 2) + self.crop_length / 4,
+ +(range_xyz[0] / 2) - self.crop_length / 4,
+ )
+ offset_y = np.random.uniform(
+ -(range_xyz[1] / 2) + self.crop_length / 4,
+ +(range_xyz[1] / 2) - self.crop_length / 4,
+ )
+
+ sample_center[0] = sample_center[0] + offset_x
+ sample_center[1] = sample_center[1] + offset_y
+
+ min_xy = sample_center - self.crop_length / 2
+ max_xy = sample_center + self.crop_length / 2
+
+ upper_idx = (
+ np.sum((point_cloud[:, :2] <= max_xy).astype(np.int32), 1) == 2
+ )
+ lower_idx = (
+ np.sum((point_cloud[:, :2] >= min_xy).astype(np.int32), 1) == 2
+ )
+
+ new_pointidx = (upper_idx) & (lower_idx)
+
+ if np.sum(new_pointidx) < self.min_points:
+ print("TOO SMALL")
+ continue
+
+ return new_pointidx
+
+ # fallback
+ print("FALLBACK")
+ return np.ones(point_cloud.shape[0], dtype=np.bool)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/scannet200/__init__.py b/models/Mask3D/build/lib/mask3d/datasets/scannet200/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_constants.py b/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_constants.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d921407068335b82ad10af912d7e9d715dbd6ca
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_constants.py
@@ -0,0 +1,704 @@
+### ScanNet Benchmark constants ###
+VALID_CLASS_IDS_20 = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 14,
+ 16,
+ 24,
+ 28,
+ 33,
+ 34,
+ 36,
+ 39,
+)
+
+CLASS_LABELS_20 = (
+ "wall",
+ "floor",
+ "cabinet",
+ "bed",
+ "chair",
+ "sofa",
+ "table",
+ "door",
+ "window",
+ "bookshelf",
+ "picture",
+ "counter",
+ "desk",
+ "curtain",
+ "refrigerator",
+ "shower curtain",
+ "toilet",
+ "sink",
+ "bathtub",
+ "otherfurniture",
+)
+
+SCANNET_COLOR_MAP_20 = {
+ 0: (0.0, 0.0, 0.0),
+ 1: (174.0, 199.0, 232.0),
+ 2: (152.0, 223.0, 138.0),
+ 3: (31.0, 119.0, 180.0),
+ 4: (255.0, 187.0, 120.0),
+ 5: (188.0, 189.0, 34.0),
+ 6: (140.0, 86.0, 75.0),
+ 7: (255.0, 152.0, 150.0),
+ 8: (214.0, 39.0, 40.0),
+ 9: (197.0, 176.0, 213.0),
+ 10: (148.0, 103.0, 189.0),
+ 11: (196.0, 156.0, 148.0),
+ 12: (23.0, 190.0, 207.0),
+ 14: (247.0, 182.0, 210.0),
+ 15: (66.0, 188.0, 102.0),
+ 16: (219.0, 219.0, 141.0),
+ 17: (140.0, 57.0, 197.0),
+ 18: (202.0, 185.0, 52.0),
+ 19: (51.0, 176.0, 203.0),
+ 20: (200.0, 54.0, 131.0),
+ 21: (92.0, 193.0, 61.0),
+ 22: (78.0, 71.0, 183.0),
+ 23: (172.0, 114.0, 82.0),
+ 24: (255.0, 127.0, 14.0),
+ 25: (91.0, 163.0, 138.0),
+ 26: (153.0, 98.0, 156.0),
+ 27: (140.0, 153.0, 101.0),
+ 28: (158.0, 218.0, 229.0),
+ 29: (100.0, 125.0, 154.0),
+ 30: (178.0, 127.0, 135.0),
+ 32: (146.0, 111.0, 194.0),
+ 33: (44.0, 160.0, 44.0),
+ 34: (112.0, 128.0, 144.0),
+ 35: (96.0, 207.0, 209.0),
+ 36: (227.0, 119.0, 194.0),
+ 37: (213.0, 92.0, 176.0),
+ 38: (94.0, 106.0, 211.0),
+ 39: (82.0, 84.0, 163.0),
+ 40: (100.0, 85.0, 144.0),
+}
+
+### ScanNet200 Benchmark constants ###
+VALID_CLASS_IDS_200 = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 121,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 221,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 286,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 331,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 399,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 572,
+ 581,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1174,
+ 1175,
+ 1176,
+ 1178,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1183,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1190,
+ 1191,
+)
+
+CLASS_LABELS_200 = (
+ "wall",
+ "chair",
+ "floor",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "bicycle",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "guitar case",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "structure",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "storage organizer",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "luggage",
+ "mattress",
+)
+
+SCANNET_COLOR_MAP_200 = {
+ 0: (0.0, 0.0, 0.0),
+ 1: (174.0, 199.0, 232.0),
+ 2: (188.0, 189.0, 34.0),
+ 3: (152.0, 223.0, 138.0),
+ 4: (255.0, 152.0, 150.0),
+ 5: (214.0, 39.0, 40.0),
+ 6: (91.0, 135.0, 229.0),
+ 7: (31.0, 119.0, 180.0),
+ 8: (229.0, 91.0, 104.0),
+ 9: (247.0, 182.0, 210.0),
+ 10: (91.0, 229.0, 110.0),
+ 11: (255.0, 187.0, 120.0),
+ 13: (141.0, 91.0, 229.0),
+ 14: (112.0, 128.0, 144.0),
+ 15: (196.0, 156.0, 148.0),
+ 16: (197.0, 176.0, 213.0),
+ 17: (44.0, 160.0, 44.0),
+ 18: (148.0, 103.0, 189.0),
+ 19: (229.0, 91.0, 223.0),
+ 21: (219.0, 219.0, 141.0),
+ 22: (192.0, 229.0, 91.0),
+ 23: (88.0, 218.0, 137.0),
+ 24: (58.0, 98.0, 137.0),
+ 26: (177.0, 82.0, 239.0),
+ 27: (255.0, 127.0, 14.0),
+ 28: (237.0, 204.0, 37.0),
+ 29: (41.0, 206.0, 32.0),
+ 31: (62.0, 143.0, 148.0),
+ 32: (34.0, 14.0, 130.0),
+ 33: (143.0, 45.0, 115.0),
+ 34: (137.0, 63.0, 14.0),
+ 35: (23.0, 190.0, 207.0),
+ 36: (16.0, 212.0, 139.0),
+ 38: (90.0, 119.0, 201.0),
+ 39: (125.0, 30.0, 141.0),
+ 40: (150.0, 53.0, 56.0),
+ 41: (186.0, 197.0, 62.0),
+ 42: (227.0, 119.0, 194.0),
+ 44: (38.0, 100.0, 128.0),
+ 45: (120.0, 31.0, 243.0),
+ 46: (154.0, 59.0, 103.0),
+ 47: (169.0, 137.0, 78.0),
+ 48: (143.0, 245.0, 111.0),
+ 49: (37.0, 230.0, 205.0),
+ 50: (14.0, 16.0, 155.0),
+ 51: (196.0, 51.0, 182.0),
+ 52: (237.0, 80.0, 38.0),
+ 54: (138.0, 175.0, 62.0),
+ 55: (158.0, 218.0, 229.0),
+ 56: (38.0, 96.0, 167.0),
+ 57: (190.0, 77.0, 246.0),
+ 58: (208.0, 49.0, 84.0),
+ 59: (208.0, 193.0, 72.0),
+ 62: (55.0, 220.0, 57.0),
+ 63: (10.0, 125.0, 140.0),
+ 64: (76.0, 38.0, 202.0),
+ 65: (191.0, 28.0, 135.0),
+ 66: (211.0, 120.0, 42.0),
+ 67: (118.0, 174.0, 76.0),
+ 68: (17.0, 242.0, 171.0),
+ 69: (20.0, 65.0, 247.0),
+ 70: (208.0, 61.0, 222.0),
+ 71: (162.0, 62.0, 60.0),
+ 72: (210.0, 235.0, 62.0),
+ 73: (45.0, 152.0, 72.0),
+ 74: (35.0, 107.0, 149.0),
+ 75: (160.0, 89.0, 237.0),
+ 76: (227.0, 56.0, 125.0),
+ 77: (169.0, 143.0, 81.0),
+ 78: (42.0, 143.0, 20.0),
+ 79: (25.0, 160.0, 151.0),
+ 80: (82.0, 75.0, 227.0),
+ 82: (253.0, 59.0, 222.0),
+ 84: (240.0, 130.0, 89.0),
+ 86: (123.0, 172.0, 47.0),
+ 87: (71.0, 194.0, 133.0),
+ 88: (24.0, 94.0, 205.0),
+ 89: (134.0, 16.0, 179.0),
+ 90: (159.0, 32.0, 52.0),
+ 93: (213.0, 208.0, 88.0),
+ 95: (64.0, 158.0, 70.0),
+ 96: (18.0, 163.0, 194.0),
+ 97: (65.0, 29.0, 153.0),
+ 98: (177.0, 10.0, 109.0),
+ 99: (152.0, 83.0, 7.0),
+ 100: (83.0, 175.0, 30.0),
+ 101: (18.0, 199.0, 153.0),
+ 102: (61.0, 81.0, 208.0),
+ 103: (213.0, 85.0, 216.0),
+ 104: (170.0, 53.0, 42.0),
+ 105: (161.0, 192.0, 38.0),
+ 106: (23.0, 241.0, 91.0),
+ 107: (12.0, 103.0, 170.0),
+ 110: (151.0, 41.0, 245.0),
+ 112: (133.0, 51.0, 80.0),
+ 115: (184.0, 162.0, 91.0),
+ 116: (50.0, 138.0, 38.0),
+ 118: (31.0, 237.0, 236.0),
+ 120: (39.0, 19.0, 208.0),
+ 121: (223.0, 27.0, 180.0),
+ 122: (254.0, 141.0, 85.0),
+ 125: (97.0, 144.0, 39.0),
+ 128: (106.0, 231.0, 176.0),
+ 130: (12.0, 61.0, 162.0),
+ 131: (124.0, 66.0, 140.0),
+ 132: (137.0, 66.0, 73.0),
+ 134: (250.0, 253.0, 26.0),
+ 136: (55.0, 191.0, 73.0),
+ 138: (60.0, 126.0, 146.0),
+ 139: (153.0, 108.0, 234.0),
+ 140: (184.0, 58.0, 125.0),
+ 141: (135.0, 84.0, 14.0),
+ 145: (139.0, 248.0, 91.0),
+ 148: (53.0, 200.0, 172.0),
+ 154: (63.0, 69.0, 134.0),
+ 155: (190.0, 75.0, 186.0),
+ 156: (127.0, 63.0, 52.0),
+ 157: (141.0, 182.0, 25.0),
+ 159: (56.0, 144.0, 89.0),
+ 161: (64.0, 160.0, 250.0),
+ 163: (182.0, 86.0, 245.0),
+ 165: (139.0, 18.0, 53.0),
+ 166: (134.0, 120.0, 54.0),
+ 168: (49.0, 165.0, 42.0),
+ 169: (51.0, 128.0, 133.0),
+ 170: (44.0, 21.0, 163.0),
+ 177: (232.0, 93.0, 193.0),
+ 180: (176.0, 102.0, 54.0),
+ 185: (116.0, 217.0, 17.0),
+ 188: (54.0, 209.0, 150.0),
+ 191: (60.0, 99.0, 204.0),
+ 193: (129.0, 43.0, 144.0),
+ 195: (252.0, 100.0, 106.0),
+ 202: (187.0, 196.0, 73.0),
+ 208: (13.0, 158.0, 40.0),
+ 213: (52.0, 122.0, 152.0),
+ 214: (128.0, 76.0, 202.0),
+ 221: (187.0, 50.0, 115.0),
+ 229: (180.0, 141.0, 71.0),
+ 230: (77.0, 208.0, 35.0),
+ 232: (72.0, 183.0, 168.0),
+ 233: (97.0, 99.0, 203.0),
+ 242: (172.0, 22.0, 158.0),
+ 250: (155.0, 64.0, 40.0),
+ 261: (118.0, 159.0, 30.0),
+ 264: (69.0, 252.0, 148.0),
+ 276: (45.0, 103.0, 173.0),
+ 283: (111.0, 38.0, 149.0),
+ 286: (184.0, 9.0, 49.0),
+ 300: (188.0, 174.0, 67.0),
+ 304: (53.0, 206.0, 53.0),
+ 312: (97.0, 235.0, 252.0),
+ 323: (66.0, 32.0, 182.0),
+ 325: (236.0, 114.0, 195.0),
+ 331: (241.0, 154.0, 83.0),
+ 342: (133.0, 240.0, 52.0),
+ 356: (16.0, 205.0, 144.0),
+ 370: (75.0, 101.0, 198.0),
+ 392: (237.0, 95.0, 251.0),
+ 395: (191.0, 52.0, 49.0),
+ 399: (227.0, 254.0, 54.0),
+ 408: (49.0, 206.0, 87.0),
+ 417: (48.0, 113.0, 150.0),
+ 488: (125.0, 73.0, 182.0),
+ 540: (229.0, 32.0, 114.0),
+ 562: (158.0, 119.0, 28.0),
+ 570: (60.0, 205.0, 27.0),
+ 572: (18.0, 215.0, 201.0),
+ 581: (79.0, 76.0, 153.0),
+ 609: (134.0, 13.0, 116.0),
+ 748: (192.0, 97.0, 63.0),
+ 776: (108.0, 163.0, 18.0),
+ 1156: (95.0, 220.0, 156.0),
+ 1163: (98.0, 141.0, 208.0),
+ 1164: (144.0, 19.0, 193.0),
+ 1165: (166.0, 36.0, 57.0),
+ 1166: (212.0, 202.0, 34.0),
+ 1167: (23.0, 206.0, 34.0),
+ 1168: (91.0, 211.0, 236.0),
+ 1169: (79.0, 55.0, 137.0),
+ 1170: (182.0, 19.0, 117.0),
+ 1171: (134.0, 76.0, 14.0),
+ 1172: (87.0, 185.0, 28.0),
+ 1173: (82.0, 224.0, 187.0),
+ 1174: (92.0, 110.0, 214.0),
+ 1175: (168.0, 80.0, 171.0),
+ 1176: (197.0, 63.0, 51.0),
+ 1178: (175.0, 199.0, 77.0),
+ 1179: (62.0, 180.0, 98.0),
+ 1180: (8.0, 91.0, 150.0),
+ 1181: (77.0, 15.0, 130.0),
+ 1182: (154.0, 65.0, 96.0),
+ 1183: (197.0, 152.0, 11.0),
+ 1184: (59.0, 155.0, 45.0),
+ 1185: (12.0, 147.0, 145.0),
+ 1186: (54.0, 35.0, 219.0),
+ 1187: (210.0, 73.0, 181.0),
+ 1188: (221.0, 124.0, 77.0),
+ 1189: (149.0, 214.0, 66.0),
+ 1190: (72.0, 185.0, 134.0),
+ 1191: (42.0, 94.0, 198.0),
+}
+
+### For instance segmentation the non-object categories ###
+VALID_PANOPTIC_IDS = (1, 3)
+
+CLASS_LABELS_PANOPTIC = ("wall", "floor")
diff --git a/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_splits.py b/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_splits.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a5585f70319d1eb061669bd82bbf3d64d0bca7b
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/scannet200/scannet200_splits.py
@@ -0,0 +1,625 @@
+### This file contains the HEAD - COMMON - TAIL split category ids for ScanNet 200
+
+HEAD_CATS_SCANNET_200 = [
+ "tv stand",
+ "curtain",
+ "blinds",
+ "shower curtain",
+ "bookshelf",
+ "tv",
+ "kitchen cabinet",
+ "pillow",
+ "lamp",
+ "dresser",
+ "monitor",
+ "object",
+ "ceiling",
+ "board",
+ "stove",
+ "closet wall",
+ "couch",
+ "office chair",
+ "kitchen counter",
+ "shower",
+ "closet",
+ "doorframe",
+ "sofa chair",
+ "mailbox",
+ "nightstand",
+ "washing machine",
+ "picture",
+ "book",
+ "sink",
+ "recycling bin",
+ "table",
+ "backpack",
+ "shower wall",
+ "toilet",
+ "copier",
+ "counter",
+ "stool",
+ "refrigerator",
+ "window",
+ "file cabinet",
+ "chair",
+ "wall",
+ "plant",
+ "coffee table",
+ "stairs",
+ "armchair",
+ "cabinet",
+ "bathroom vanity",
+ "bathroom stall",
+ "mirror",
+ "blackboard",
+ "trash can",
+ "stair rail",
+ "box",
+ "towel",
+ "door",
+ "clothes",
+ "whiteboard",
+ "bed",
+ "floor",
+ "bathtub",
+ "desk",
+ "wardrobe",
+ "clothes dryer",
+ "radiator",
+ "shelf",
+]
+COMMON_CATS_SCANNET_200 = [
+ "cushion",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "toilet paper",
+ "printer",
+ "blanket",
+ "microwave",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "basket",
+ "fan",
+ "laptop",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "rack",
+ "piano",
+ "suitcase",
+ "rail",
+ "container",
+ "telephone",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "seat",
+ "column",
+ "bicycle",
+ "ladder",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "machine",
+ "mat",
+ "windowsill",
+ "bulletin board",
+ "fireplace",
+ "mini fridge",
+ "water cooler",
+ "shower door",
+ "pillar",
+ "ledge",
+ "furniture",
+ "cart",
+ "decoration",
+ "closet door",
+ "vacuum cleaner",
+ "dish rack",
+ "range hood",
+ "projector screen",
+ "divider",
+ "bathroom counter",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "bathroom cabinet",
+ "structure",
+ "storage organizer",
+ "potted plant",
+ "mattress",
+]
+TAIL_CATS_SCANNET_200 = [
+ "paper",
+ "plate",
+ "soap dispenser",
+ "bucket",
+ "clock",
+ "guitar",
+ "toilet paper holder",
+ "speaker",
+ "cup",
+ "paper towel roll",
+ "bar",
+ "toaster",
+ "ironing board",
+ "soap dish",
+ "toilet paper dispenser",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "paper cutter",
+ "tray",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "sign",
+ "projector",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "broom",
+ "guitar case",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "laundry detergent",
+ "dumbbell",
+ "tube",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "luggage",
+]
+
+
+### Given the different size of the official train and val sets, not all ScanNet200 categories are present in the validation set.
+### Here we list of categories with labels and IDs present in both train and validation set, and the remaining categories those are present in train, but not in val
+### We dont evaluate on unseen validation categories in this benchmark
+
+VALID_CLASS_IDS_200_VALIDATION = (
+ "wall",
+ "chair",
+ "floor",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "closet rod",
+ "coffee kettle",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "mattress",
+)
+
+CLASS_LABELS_200_VALIDATION = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1175,
+ 1176,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1191,
+)
+
+VALID_CLASS_IDS_200_TRAIN_ONLY = (
+ "bicycle",
+ "storage container",
+ "candle",
+ "guitar case",
+ "purse",
+ "alarm clock",
+ "music stand",
+ "cd case",
+ "structure",
+ "storage organizer",
+ "luggage",
+)
+
+CLASS_LABELS_200_TRAIN_ONLY = (
+ 121,
+ 221,
+ 286,
+ 331,
+ 399,
+ 572,
+ 581,
+ 1174,
+ 1178,
+ 1183,
+ 1190,
+)
diff --git a/models/Mask3D/build/lib/mask3d/datasets/semseg.py b/models/Mask3D/build/lib/mask3d/datasets/semseg.py
new file mode 100644
index 0000000000000000000000000000000000000000..a848b1a20e4690971bf16790fcea00ade84441c0
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/semseg.py
@@ -0,0 +1,993 @@
+import logging
+from itertools import product
+from pathlib import Path
+from random import random, sample, uniform
+from typing import List, Optional, Tuple, Union
+from random import choice
+from copy import deepcopy
+from random import randrange
+
+
+import numpy
+import torch
+from datasets.random_cuboid import RandomCuboid
+
+import albumentations as A
+import numpy as np
+import scipy
+import volumentations as V
+import yaml
+
+# from yaml import CLoader as Loader
+from torch.utils.data import Dataset
+from datasets.scannet200.scannet200_constants import (
+ SCANNET_COLOR_MAP_200,
+ SCANNET_COLOR_MAP_20,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class SemanticSegmentationDataset(Dataset):
+ """Docstring for SemanticSegmentationDataset."""
+
+ def __init__(
+ self,
+ dataset_name="scannet",
+ data_dir: Optional[Union[str, Tuple[str]]] = "data/processed/scannet",
+ label_db_filepath: Optional[
+ str
+ ] = "configs/scannet_preprocessing/label_database.yaml",
+ # mean std values from scannet
+ color_mean_std: Optional[Union[str, Tuple[Tuple[float]]]] = (
+ (0.47793125906962, 0.4303257521323044, 0.3749598901421883),
+ (0.2834475483823543, 0.27566157565723015, 0.27018971370874995),
+ ),
+ mode: Optional[str] = "train",
+ add_colors: Optional[bool] = True,
+ add_normals: Optional[bool] = True,
+ add_raw_coordinates: Optional[bool] = False,
+ add_instance: Optional[bool] = False,
+ num_labels: Optional[int] = -1,
+ data_percent: Optional[float] = 1.0,
+ ignore_label: Optional[Union[int, Tuple[int]]] = 255,
+ volume_augmentations_path: Optional[str] = None,
+ image_augmentations_path: Optional[str] = None,
+ instance_oversampling=0,
+ place_around_existing=False,
+ max_cut_region=0,
+ point_per_cut=100,
+ flip_in_center=False,
+ noise_rate=0.0,
+ resample_points=0.0,
+ cache_data=False,
+ add_unlabeled_pc=False,
+ task="instance_segmentation",
+ cropping=False,
+ cropping_args=None,
+ is_tta=False,
+ crop_min_size=20000,
+ crop_length=6.0,
+ cropping_v1=True,
+ reps_per_epoch=1,
+ area=-1,
+ on_crops=False,
+ eval_inner_core=-1,
+ filter_out_classes=[],
+ label_offset=0,
+ add_clip=False,
+ is_elastic_distortion=True,
+ color_drop=0.0,
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "unknown task"
+
+ self.add_clip = add_clip
+ self.dataset_name = dataset_name
+ self.is_elastic_distortion = is_elastic_distortion
+ self.color_drop = color_drop
+
+ if self.dataset_name == "scannet":
+ self.color_map = SCANNET_COLOR_MAP_20
+ self.color_map[255] = (255, 255, 255)
+ elif self.dataset_name == "stpls3d":
+ self.color_map = {
+ 0: [0, 255, 0], # Ground
+ 1: [0, 0, 255], # Build
+ 2: [0, 255, 255], # LowVeg
+ 3: [255, 255, 0], # MediumVeg
+ 4: [255, 0, 255], # HiVeg
+ 5: [100, 100, 255], # Vehicle
+ 6: [200, 200, 100], # Truck
+ 7: [170, 120, 200], # Aircraft
+ 8: [255, 0, 0], # MilitaryVec
+ 9: [200, 100, 100], # Bike
+ 10: [10, 200, 100], # Motorcycle
+ 11: [200, 200, 200], # LightPole
+ 12: [50, 50, 50], # StreetSign
+ 13: [60, 130, 60], # Clutter
+ 14: [130, 30, 60],
+ } # Fence
+ elif self.dataset_name == "scannet200":
+ self.color_map = SCANNET_COLOR_MAP_200
+ elif self.dataset_name == "s3dis":
+ self.color_map = {
+ 0: [0, 255, 0], # ceiling
+ 1: [0, 0, 255], # floor
+ 2: [0, 255, 255], # wall
+ 3: [255, 255, 0], # beam
+ 4: [255, 0, 255], # column
+ 5: [100, 100, 255], # window
+ 6: [200, 200, 100], # door
+ 7: [170, 120, 200], # table
+ 8: [255, 0, 0], # chair
+ 9: [200, 100, 100], # sofa
+ 10: [10, 200, 100], # bookcase
+ 11: [200, 200, 200], # board
+ 12: [50, 50, 50], # clutter
+ }
+ else:
+ assert False, "dataset not known"
+
+ self.task = task
+
+ self.filter_out_classes = filter_out_classes
+ self.label_offset = label_offset
+
+ self.area = area
+ self.eval_inner_core = eval_inner_core
+
+ self.reps_per_epoch = reps_per_epoch
+
+ self.cropping = cropping
+ self.cropping_args = cropping_args
+ self.is_tta = is_tta
+ self.on_crops = on_crops
+
+ self.crop_min_size = crop_min_size
+ self.crop_length = crop_length
+
+ self.version1 = cropping_v1
+
+ self.random_cuboid = RandomCuboid(
+ self.crop_min_size,
+ crop_length=self.crop_length,
+ version1=self.version1,
+ )
+
+ self.mode = mode
+ self.data_dir = data_dir
+ self.add_unlabeled_pc = add_unlabeled_pc
+ if add_unlabeled_pc:
+ self.other_database = self._load_yaml(
+ Path(data_dir).parent / "matterport" / "train_database.yaml"
+ )
+ if type(data_dir) == str:
+ self.data_dir = [self.data_dir]
+ self.ignore_label = ignore_label
+ self.add_colors = add_colors
+ self.add_normals = add_normals
+ self.add_instance = add_instance
+ self.add_raw_coordinates = add_raw_coordinates
+ self.instance_oversampling = instance_oversampling
+ self.place_around_existing = place_around_existing
+ self.max_cut_region = max_cut_region
+ self.point_per_cut = point_per_cut
+ self.flip_in_center = flip_in_center
+ self.noise_rate = noise_rate
+ self.resample_points = resample_points
+
+ # loading database files
+ self._data = []
+ for database_path in self.data_dir:
+ database_path = Path(database_path)
+ mode = 'Validation'
+ if self.dataset_name != "s3dis":
+ if not (database_path / f"{mode}_database.yaml").exists():
+ print(
+ f"generate {database_path}/{mode}_database.yaml first"
+ )
+ exit()
+ self._data.extend(
+ self._load_yaml(database_path / f"{mode}_database.yaml")
+ )
+ else:
+ # mode_s3dis = f"Area_{self.area}"
+ mode_s3dis = "Validation"
+ if self.mode == "train":
+ mode_s3dis = "train_" + mode_s3dis
+ if not (
+ database_path / f"{mode_s3dis}_database.yaml"
+ ).exists():
+ print(
+ f"generate {database_path}/{mode_s3dis}_database.yaml first"
+ )
+ exit()
+ self._data.extend(
+ self._load_yaml(
+ database_path / f"{mode_s3dis}_database.yaml"
+ )
+ )
+ if data_percent < 1.0:
+ self._data = sample(
+ self._data, int(len(self._data) * data_percent)
+ )
+ # labels = self._load_yaml(Path(label_db_filepath))
+
+ # if working only on classes for validation - discard others
+ # self._labels = self._select_correct_labels(labels, num_labels)
+
+ if instance_oversampling > 0:
+ self.instance_data = self._load_yaml(
+ Path(label_db_filepath).parent / "instance_database.yaml"
+ )
+
+ # normalize color channels
+ if self.dataset_name == "s3dis":
+ color_mean_std = color_mean_std.replace(
+ "color_mean_std.yaml", f"Area_{self.area}_color_mean_std.yaml"
+ )
+
+ if Path(str(color_mean_std)).exists():
+ color_mean_std = self._load_yaml(color_mean_std)
+ color_mean, color_std = (
+ tuple(color_mean_std["mean"]),
+ tuple(color_mean_std["std"]),
+ )
+ elif len(color_mean_std[0]) == 3 and len(color_mean_std[1]) == 3:
+ color_mean, color_std = color_mean_std[0], color_mean_std[1]
+ else:
+ logger.error(
+ "pass mean and std as tuple of tuples, or as an .yaml file"
+ )
+
+ # augmentations
+ self.volume_augmentations = V.NoOp()
+ if (volume_augmentations_path is not None) and (
+ volume_augmentations_path != "none"
+ ):
+ self.volume_augmentations = V.load(
+ Path(volume_augmentations_path), data_format="yaml"
+ )
+ self.image_augmentations = A.NoOp()
+ if (image_augmentations_path is not None) and (
+ image_augmentations_path != "none"
+ ):
+ self.image_augmentations = A.load(
+ Path(image_augmentations_path), data_format="yaml"
+ )
+ # mandatory color augmentation
+ if add_colors:
+ self.normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+ self.cache_data = cache_data
+ # new_data = []
+ if self.cache_data:
+ new_data = []
+ for i in range(len(self._data)):
+ self._data[i]["data"] = np.load(
+ self.data[i]["filepath"].replace("../../", "")
+ )
+ if self.on_crops:
+ if self.eval_inner_core == -1:
+ for block_id, block in enumerate(
+ self.splitPointCloud(self._data[i]["data"])
+ ):
+ if len(block) > 10000:
+ new_data.append(
+ {
+ "instance_gt_filepath": self._data[i][
+ "instance_gt_filepath"
+ ][block_id]
+ if len(
+ self._data[i][
+ "instance_gt_filepath"
+ ]
+ )
+ > 0
+ else list(),
+ "scene": f"{self._data[i]['scene'].replace('.txt', '')}_{block_id}.txt",
+ "raw_filepath": f"{self.data[i]['filepath'].replace('.npy', '')}_{block_id}",
+ "data": block,
+ }
+ )
+ else:
+ assert False
+ else:
+ conds_inner, blocks_outer = self.splitPointCloud(
+ self._data[i]["data"],
+ size=self.crop_length,
+ inner_core=self.eval_inner_core,
+ )
+
+ for block_id in range(len(conds_inner)):
+ cond_inner = conds_inner[block_id]
+ block_outer = blocks_outer[block_id]
+
+ if cond_inner.sum() > 10000:
+ new_data.append(
+ {
+ "instance_gt_filepath": self._data[i][
+ "instance_gt_filepath"
+ ][block_id]
+ if len(
+ self._data[i][
+ "instance_gt_filepath"
+ ]
+ )
+ > 0
+ else list(),
+ "scene": f"{self._data[i]['scene'].replace('.txt', '')}_{block_id}.txt",
+ "raw_filepath": f"{self.data[i]['filepath'].replace('.npy', '')}_{block_id}",
+ "data": block_outer,
+ "cond_inner": cond_inner,
+ }
+ )
+ else:
+ assert False
+
+ if self.on_crops:
+ self._data = new_data
+ # new_data.append(np.load(self.data[i]["filepath"].replace("../../", "")))
+ # self._data = new_data
+
+ def splitPointCloud(self, cloud, size=50.0, stride=50, inner_core=-1):
+ if inner_core == -1:
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - size) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - size) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks = []
+ for (x, y) in cells:
+ xcond = (cloud[:, 0] <= x + size) & (cloud[:, 0] >= x)
+ ycond = (cloud[:, 1] <= y + size) & (cloud[:, 1] >= y)
+ cond = xcond & ycond
+ block = cloud[cond, :]
+ blocks.append(block)
+ return blocks
+ else:
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - inner_core) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - inner_core) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks_outer = []
+ conds_inner = []
+ for (x, y) in cells:
+ xcond_outer = (
+ cloud[:, 0] <= x + inner_core / 2.0 + size / 2
+ ) & (cloud[:, 0] >= x + inner_core / 2.0 - size / 2)
+ ycond_outer = (
+ cloud[:, 1] <= y + inner_core / 2.0 + size / 2
+ ) & (cloud[:, 1] >= y + inner_core / 2.0 - size / 2)
+
+ cond_outer = xcond_outer & ycond_outer
+ block_outer = cloud[cond_outer, :]
+
+ xcond_inner = (block_outer[:, 0] <= x + inner_core) & (
+ block_outer[:, 0] >= x
+ )
+ ycond_inner = (block_outer[:, 1] <= y + inner_core) & (
+ block_outer[:, 1] >= y
+ )
+
+ cond_inner = xcond_inner & ycond_inner
+
+ conds_inner.append(cond_inner)
+ blocks_outer.append(block_outer)
+ return conds_inner, blocks_outer
+
+ def map2color(self, labels):
+ output_colors = list()
+
+ for label in labels:
+ output_colors.append(self.color_map[label])
+
+ return torch.tensor(output_colors)
+
+ def __len__(self):
+ if self.is_tta:
+ return 5 * len(self.data)
+ else:
+ return self.reps_per_epoch * len(self.data)
+
+ def __getitem__(self, idx: int):
+ idx = idx % len(self.data)
+ if self.is_tta:
+ idx = idx % len(self.data)
+
+ if self.cache_data:
+ points = self.data[idx]["data"]
+ else:
+ assert not self.on_crops, "you need caching if on crops"
+ points = np.load(self.data[idx]["filepath"].replace("../../", ""))
+
+ if "train" in self.mode and self.dataset_name in ["s3dis", "stpls3d"]:
+ inds = self.random_cuboid(points)
+ points = points[inds]
+
+ coordinates, color, normals, segments, labels = (
+ points[:, :3],
+ points[:, 3:6],
+ points[:, 6:9],
+ points[:, 9],
+ points[:, 10:12],
+ )
+
+ raw_coordinates = coordinates.copy()
+ raw_color = color
+ raw_normals = normals
+
+ if not self.add_colors:
+ color = np.ones((len(color), 3))
+
+ # volume and image augmentations for train
+ if "train" in self.mode or self.is_tta:
+ if self.cropping:
+ new_idx = self.random_cuboid(
+ coordinates,
+ labels[:, 1],
+ self._remap_from_zero(labels[:, 0].copy()),
+ )
+
+ coordinates = coordinates[new_idx]
+ color = color[new_idx]
+ labels = labels[new_idx]
+ segments = segments[new_idx]
+ raw_color = raw_color[new_idx]
+ raw_normals = raw_normals[new_idx]
+ normals = normals[new_idx]
+ points = points[new_idx]
+
+ coordinates -= coordinates.mean(0)
+
+ try:
+ coordinates += (
+ np.random.uniform(coordinates.min(0), coordinates.max(0))
+ / 2
+ )
+ except OverflowError as err:
+ print(coordinates)
+ print(coordinates.shape)
+ raise err
+
+ if self.instance_oversampling > 0.0:
+ (
+ coordinates,
+ color,
+ normals,
+ labels,
+ ) = self.augment_individual_instance(
+ coordinates,
+ color,
+ normals,
+ labels,
+ self.instance_oversampling,
+ )
+
+ if self.flip_in_center:
+ coordinates = flip_in_center(coordinates)
+
+ for i in (0, 1):
+ if random() < 0.5:
+ coord_max = np.max(points[:, i])
+ coordinates[:, i] = coord_max - coordinates[:, i]
+
+ if random() < 0.95:
+ if self.is_elastic_distortion:
+ for granularity, magnitude in ((0.2, 0.4), (0.8, 1.6)):
+ coordinates = elastic_distortion(
+ coordinates, granularity, magnitude
+ )
+ aug = self.volume_augmentations(
+ points=coordinates,
+ normals=normals,
+ features=color,
+ labels=labels,
+ )
+ coordinates, color, normals, labels = (
+ aug["points"],
+ aug["features"],
+ aug["normals"],
+ aug["labels"],
+ )
+ pseudo_image = color.astype(np.uint8)[np.newaxis, :, :]
+ color = np.squeeze(
+ self.image_augmentations(image=pseudo_image)["image"]
+ )
+
+ if self.point_per_cut != 0:
+ number_of_cuts = int(len(coordinates) / self.point_per_cut)
+ for _ in range(number_of_cuts):
+ size_of_cut = np.random.uniform(0.05, self.max_cut_region)
+ # not wall, floor or empty
+ point = choice(coordinates)
+ x_min = point[0] - size_of_cut
+ x_max = x_min + size_of_cut
+ y_min = point[1] - size_of_cut
+ y_max = y_min + size_of_cut
+ z_min = point[2] - size_of_cut
+ z_max = z_min + size_of_cut
+ indexes = crop(
+ coordinates, x_min, y_min, z_min, x_max, y_max, z_max
+ )
+ coordinates, normals, color, labels = (
+ coordinates[~indexes],
+ normals[~indexes],
+ color[~indexes],
+ labels[~indexes],
+ )
+
+ # if self.noise_rate > 0:
+ # coordinates, color, normals, labels = random_points(
+ # coordinates,
+ # color,
+ # normals,
+ # labels,
+ # self.noise_rate,
+ # self.ignore_label,
+ # )
+
+ if (self.resample_points > 0) or (self.noise_rate > 0):
+ coordinates, color, normals, labels = random_around_points(
+ coordinates,
+ color,
+ normals,
+ labels,
+ self.resample_points,
+ self.noise_rate,
+ self.ignore_label,
+ )
+
+ if self.add_unlabeled_pc:
+ if random() < 0.8:
+ new_points = np.load(
+ self.other_database[
+ np.random.randint(0, len(self.other_database) - 1)
+ ]["filepath"]
+ )
+ (
+ unlabeled_coords,
+ unlabeled_color,
+ unlabeled_normals,
+ unlabeled_labels,
+ ) = (
+ new_points[:, :3],
+ new_points[:, 3:6],
+ new_points[:, 6:9],
+ new_points[:, 9:],
+ )
+ unlabeled_coords -= unlabeled_coords.mean(0)
+ unlabeled_coords += (
+ np.random.uniform(
+ unlabeled_coords.min(0), unlabeled_coords.max(0)
+ )
+ / 2
+ )
+
+ aug = self.volume_augmentations(
+ points=unlabeled_coords,
+ normals=unlabeled_normals,
+ features=unlabeled_color,
+ labels=unlabeled_labels,
+ )
+ (
+ unlabeled_coords,
+ unlabeled_color,
+ unlabeled_normals,
+ unlabeled_labels,
+ ) = (
+ aug["points"],
+ aug["features"],
+ aug["normals"],
+ aug["labels"],
+ )
+ pseudo_image = unlabeled_color.astype(np.uint8)[
+ np.newaxis, :, :
+ ]
+ unlabeled_color = np.squeeze(
+ self.image_augmentations(image=pseudo_image)["image"]
+ )
+
+ coordinates = np.concatenate(
+ (coordinates, unlabeled_coords)
+ )
+ color = np.concatenate((color, unlabeled_color))
+ normals = np.concatenate((normals, unlabeled_normals))
+ labels = np.concatenate(
+ (
+ labels,
+ np.full_like(unlabeled_labels, self.ignore_label),
+ )
+ )
+
+ if random() < self.color_drop:
+ color[:] = 255
+
+ # normalize color information
+ pseudo_image = color.astype(np.uint8)[np.newaxis, :, :]
+ color = np.squeeze(self.normalize_color(image=pseudo_image)["image"])
+
+ # prepare labels and map from 0 to 20(40)
+ labels = labels.astype(np.int32)
+ # if labels.size > 0:
+ # labels[:, 0] = self._remap_from_zero(labels[:, 0])
+ # if not self.add_instance:
+ # # taking only first column, which is segmentation label, not instance
+ # labels = labels[:, 0].flatten()[..., None]
+
+ labels = np.hstack((labels, segments[..., None].astype(np.int32)))
+
+ features = color
+ if self.add_normals:
+ features = np.hstack((features, normals))
+ if self.add_raw_coordinates:
+ if len(features.shape) == 1:
+ features = np.hstack((features[None, ...], coordinates))
+ else:
+ features = np.hstack((features, coordinates))
+
+ # if self.task != "semantic_segmentation":
+ if self.data[idx]["raw_filepath"].split("/")[-2] in [
+ "scene0636_00",
+ "scene0154_00",
+ ]:
+ return self.__getitem__(0)
+
+ if self.dataset_name == "s3dis":
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["area"] + "_" + self.data[idx]["scene"],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+ if self.dataset_name == "stpls3d":
+ if labels.shape[1] != 1: # only segments --> test set!
+ if np.unique(labels[:, -2]).shape[0] < 2:
+ print("NO INSTANCES")
+ return self.__getitem__(0)
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["scene"],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+ else:
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["raw_filepath"].split("/")[-2],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+
+ @property
+ def data(self):
+ """database file containing information about preproscessed dataset"""
+ return self._data
+
+ @property
+ def label_info(self):
+ """database file containing information labels used by dataset"""
+ return self._labels
+
+ @staticmethod
+ def _load_yaml(filepath):
+ with open(filepath) as f:
+ # file = yaml.load(f, Loader=Loader)
+ file = yaml.load(f)
+ return file
+
+ def _select_correct_labels(self, labels, num_labels):
+ number_of_validation_labels = 0
+ number_of_all_labels = 0
+ for (
+ k,
+ v,
+ ) in labels.items():
+ number_of_all_labels += 1
+ if v["validation"]:
+ number_of_validation_labels += 1
+
+ if num_labels == number_of_all_labels:
+ return labels
+ elif num_labels == number_of_validation_labels:
+ valid_labels = dict()
+ for (
+ k,
+ v,
+ ) in labels.items():
+ if v["validation"]:
+ valid_labels.update({k: v})
+ return valid_labels
+ else:
+ msg = f"""not available number labels, select from:
+ {number_of_validation_labels}, {number_of_all_labels}"""
+ raise ValueError(msg)
+
+ def _remap_from_zero(self, labels):
+ labels[
+ ~np.isin(labels, list(self.label_info.keys()))
+ ] = self.ignore_label
+ # remap to the range from 0
+ for i, k in enumerate(self.label_info.keys()):
+ labels[labels == k] = i
+ return labels
+
+ def _remap_model_output(self, output):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(self.label_info.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
+
+ def augment_individual_instance(
+ self, coordinates, color, normals, labels, oversampling=1.0
+ ):
+ max_instance = int(len(np.unique(labels[:, 1])))
+ # randomly selecting half of non-zero instances
+ for instance in range(0, int(max_instance * oversampling)):
+ if self.place_around_existing:
+ center = choice(
+ coordinates[
+ labels[:, 1] == choice(np.unique(labels[:, 1]))
+ ]
+ )
+ else:
+ center = np.array(
+ [uniform(-5, 5), uniform(-5, 5), uniform(-0.5, 2)]
+ )
+ instance = choice(choice(self.instance_data))
+ instance = np.load(instance["instance_filepath"])
+ # centering two objects
+ instance[:, :3] = (
+ instance[:, :3] - instance[:, :3].mean(axis=0) + center
+ )
+ max_instance = max_instance + 1
+ instance[:, -1] = max_instance
+ aug = V.Compose(
+ [
+ V.Scale3d(),
+ V.RotateAroundAxis3d(
+ rotation_limit=np.pi / 24, axis=(1, 0, 0)
+ ),
+ V.RotateAroundAxis3d(
+ rotation_limit=np.pi / 24, axis=(0, 1, 0)
+ ),
+ V.RotateAroundAxis3d(rotation_limit=np.pi, axis=(0, 0, 1)),
+ ]
+ )(
+ points=instance[:, :3],
+ features=instance[:, 3:6],
+ normals=instance[:, 6:9],
+ labels=instance[:, 9:],
+ )
+ coordinates = np.concatenate((coordinates, aug["points"]))
+ color = np.concatenate((color, aug["features"]))
+ normals = np.concatenate((normals, aug["normals"]))
+ labels = np.concatenate((labels, aug["labels"]))
+
+ return coordinates, color, normals, labels
+
+
+def elastic_distortion(pointcloud, granularity, magnitude):
+ """Apply elastic distortion on sparse coordinate space.
+
+ pointcloud: numpy array of (number of points, at least 3 spatial dims)
+ granularity: size of the noise grid (in same scale[m/cm] as the voxel grid)
+ magnitude: noise multiplier
+ """
+ blurx = np.ones((3, 1, 1, 1)).astype("float32") / 3
+ blury = np.ones((1, 3, 1, 1)).astype("float32") / 3
+ blurz = np.ones((1, 1, 3, 1)).astype("float32") / 3
+ coords = pointcloud[:, :3]
+ coords_min = coords.min(0)
+
+ # Create Gaussian noise tensor of the size given by granularity.
+ noise_dim = ((coords - coords_min).max(0) // granularity).astype(int) + 3
+ noise = np.random.randn(*noise_dim, 3).astype(np.float32)
+
+ # Smoothing.
+ for _ in range(2):
+ noise = scipy.ndimage.filters.convolve(
+ noise, blurx, mode="constant", cval=0
+ )
+ noise = scipy.ndimage.filters.convolve(
+ noise, blury, mode="constant", cval=0
+ )
+ noise = scipy.ndimage.filters.convolve(
+ noise, blurz, mode="constant", cval=0
+ )
+
+ # Trilinear interpolate noise filters for each spatial dimensions.
+ ax = [
+ np.linspace(d_min, d_max, d)
+ for d_min, d_max, d in zip(
+ coords_min - granularity,
+ coords_min + granularity * (noise_dim - 2),
+ noise_dim,
+ )
+ ]
+ interp = scipy.interpolate.RegularGridInterpolator(
+ ax, noise, bounds_error=0, fill_value=0
+ )
+ pointcloud[:, :3] = coords + interp(coords) * magnitude
+ return pointcloud
+
+
+def crop(points, x_min, y_min, z_min, x_max, y_max, z_max):
+ if x_max <= x_min or y_max <= y_min or z_max <= z_min:
+ raise ValueError(
+ "We should have x_min < x_max and y_min < y_max and z_min < z_max. But we got"
+ " (x_min = {x_min}, y_min = {y_min}, z_min = {z_min},"
+ " x_max = {x_max}, y_max = {y_max}, z_max = {z_max})".format(
+ x_min=x_min,
+ x_max=x_max,
+ y_min=y_min,
+ y_max=y_max,
+ z_min=z_min,
+ z_max=z_max,
+ )
+ )
+ inds = np.all(
+ [
+ (points[:, 0] >= x_min),
+ (points[:, 0] < x_max),
+ (points[:, 1] >= y_min),
+ (points[:, 1] < y_max),
+ (points[:, 2] >= z_min),
+ (points[:, 2] < z_max),
+ ],
+ axis=0,
+ )
+ return inds
+
+
+def flip_in_center(coordinates):
+ # moving coordinates to center
+ coordinates -= coordinates.mean(0)
+ aug = V.Compose(
+ [
+ V.Flip3d(axis=(0, 1, 0), always_apply=True),
+ V.Flip3d(axis=(1, 0, 0), always_apply=True),
+ ]
+ )
+
+ first_crop = coordinates[:, 0] > 0
+ first_crop &= coordinates[:, 1] > 0
+ # x -y
+ second_crop = coordinates[:, 0] > 0
+ second_crop &= coordinates[:, 1] < 0
+ # -x y
+ third_crop = coordinates[:, 0] < 0
+ third_crop &= coordinates[:, 1] > 0
+ # -x -y
+ fourth_crop = coordinates[:, 0] < 0
+ fourth_crop &= coordinates[:, 1] < 0
+
+ if first_crop.size > 1:
+ coordinates[first_crop] = aug(points=coordinates[first_crop])["points"]
+ if second_crop.size > 1:
+ minimum = coordinates[second_crop].min(0)
+ minimum[2] = 0
+ minimum[0] = 0
+ coordinates[second_crop] = aug(points=coordinates[second_crop])[
+ "points"
+ ]
+ coordinates[second_crop] += minimum
+ if third_crop.size > 1:
+ minimum = coordinates[third_crop].min(0)
+ minimum[2] = 0
+ minimum[1] = 0
+ coordinates[third_crop] = aug(points=coordinates[third_crop])["points"]
+ coordinates[third_crop] += minimum
+ if fourth_crop.size > 1:
+ minimum = coordinates[fourth_crop].min(0)
+ minimum[2] = 0
+ coordinates[fourth_crop] = aug(points=coordinates[fourth_crop])[
+ "points"
+ ]
+ coordinates[fourth_crop] += minimum
+
+ return coordinates
+
+
+def random_around_points(
+ coordinates,
+ color,
+ normals,
+ labels,
+ rate=0.2,
+ noise_rate=0,
+ ignore_label=255,
+):
+ coord_indexes = sample(
+ list(range(len(coordinates))), k=int(len(coordinates) * rate)
+ )
+ noisy_coordinates = deepcopy(coordinates[coord_indexes])
+ noisy_coordinates += np.random.uniform(
+ -0.2 - noise_rate, 0.2 + noise_rate, size=noisy_coordinates.shape
+ )
+
+ if noise_rate > 0:
+ noisy_color = np.random.randint(0, 255, size=noisy_coordinates.shape)
+ noisy_normals = np.random.rand(*noisy_coordinates.shape) * 2 - 1
+ noisy_labels = np.full(labels[coord_indexes].shape, ignore_label)
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+ else:
+ noisy_color = deepcopy(color[coord_indexes])
+ noisy_normals = deepcopy(normals[coord_indexes])
+ noisy_labels = deepcopy(labels[coord_indexes])
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+
+ return coordinates, color, normals, labels
+
+
+def random_points(
+ coordinates, color, normals, labels, noise_rate=0.6, ignore_label=255
+):
+ max_boundary = coordinates.max(0) + 0.1
+ min_boundary = coordinates.min(0) - 0.1
+
+ noisy_coordinates = int(
+ (max(max_boundary) - min(min_boundary)) / noise_rate
+ )
+
+ noisy_coordinates = np.array(
+ list(
+ product(
+ np.linspace(
+ min_boundary[0], max_boundary[0], noisy_coordinates
+ ),
+ np.linspace(
+ min_boundary[1], max_boundary[1], noisy_coordinates
+ ),
+ np.linspace(
+ min_boundary[2], max_boundary[2], noisy_coordinates
+ ),
+ )
+ )
+ )
+ noisy_coordinates += np.random.uniform(
+ -noise_rate, noise_rate, size=noisy_coordinates.shape
+ )
+
+ noisy_color = np.random.randint(0, 255, size=noisy_coordinates.shape)
+ noisy_normals = np.random.rand(*noisy_coordinates.shape) * 2 - 1
+ noisy_labels = np.full(
+ (noisy_coordinates.shape[0], labels.shape[1]), ignore_label
+ )
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+ return coordinates, color, normals, labels
diff --git a/models/Mask3D/build/lib/mask3d/datasets/utils.py b/models/Mask3D/build/lib/mask3d/datasets/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..46d8dd7e112f9722e2af65a76f24191600764a00
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/datasets/utils.py
@@ -0,0 +1,639 @@
+import MinkowskiEngine as ME
+import numpy as np
+import torch
+from random import random
+
+
+class VoxelizeCollate:
+ def __init__(
+ self,
+ ignore_label=255,
+ voxel_size=1,
+ mode="test",
+ small_crops=False,
+ very_small_crops=False,
+ batch_instance=False,
+ probing=False,
+ task="instance_segmentation",
+ ignore_class_threshold=100,
+ filter_out_classes=[],
+ label_offset=0,
+ num_queries=None,
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "task not known"
+ self.task = task
+ self.filter_out_classes = filter_out_classes
+ self.label_offset = label_offset
+ self.voxel_size = voxel_size
+ self.ignore_label = ignore_label
+ self.mode = mode
+ self.batch_instance = batch_instance
+ self.small_crops = small_crops
+ self.very_small_crops = very_small_crops
+ self.probing = probing
+ self.ignore_class_threshold = ignore_class_threshold
+
+ self.num_queries = num_queries
+
+ def __call__(self, batch):
+ if ("train" in self.mode) and (
+ self.small_crops or self.very_small_crops
+ ):
+ batch = make_crops(batch)
+ if ("train" in self.mode) and self.very_small_crops:
+ batch = make_crops(batch)
+ return voxelize(
+ batch,
+ self.ignore_label,
+ self.voxel_size,
+ self.probing,
+ self.mode,
+ task=self.task,
+ ignore_class_threshold=self.ignore_class_threshold,
+ filter_out_classes=self.filter_out_classes,
+ label_offset=self.label_offset,
+ num_queries=self.num_queries,
+ )
+
+
+class VoxelizeCollateMerge:
+ def __init__(
+ self,
+ ignore_label=255,
+ voxel_size=1,
+ mode="test",
+ scenes=2,
+ small_crops=False,
+ very_small_crops=False,
+ batch_instance=False,
+ make_one_pc_noise=False,
+ place_nearby=False,
+ place_far=False,
+ proba=1,
+ probing=False,
+ task="instance_segmentation",
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "task not known"
+ self.task = task
+ self.mode = mode
+ self.scenes = scenes
+ self.small_crops = small_crops
+ self.very_small_crops = very_small_crops
+ self.ignore_label = ignore_label
+ self.voxel_size = voxel_size
+ self.batch_instance = batch_instance
+ self.make_one_pc_noise = make_one_pc_noise
+ self.place_nearby = place_nearby
+ self.place_far = place_far
+ self.proba = proba
+ self.probing = probing
+
+ def __call__(self, batch):
+ if (
+ ("train" in self.mode)
+ and (not self.make_one_pc_noise)
+ and (self.proba > random())
+ ):
+ if self.small_crops or self.very_small_crops:
+ batch = make_crops(batch)
+ if self.very_small_crops:
+ batch = make_crops(batch)
+ if self.batch_instance:
+ batch = batch_instances(batch)
+ new_batch = []
+ for i in range(0, len(batch), self.scenes):
+ batch_coordinates = []
+ batch_features = []
+ batch_labels = []
+
+ batch_filenames = ""
+ batch_raw_color = []
+ batch_raw_normals = []
+
+ offset_instance_id = 0
+ offset_segment_id = 0
+
+ for j in range(min(len(batch[i:]), self.scenes)):
+ batch_coordinates.append(batch[i + j][0])
+ batch_features.append(batch[i + j][1])
+
+ if j == 0:
+ batch_filenames = batch[i + j][3]
+ else:
+ batch_filenames = (
+ batch_filenames + f"+{batch[i + j][3]}"
+ )
+
+ batch_raw_color.append(batch[i + j][4])
+ batch_raw_normals.append(batch[i + j][5])
+
+ # make instance ids and segment ids unique
+ # take care that -1 instances stay at -1
+ batch_labels.append(
+ batch[i + j][2]
+ + [0, offset_instance_id, offset_segment_id]
+ )
+ batch_labels[-1][batch[i + j][2][:, 1] == -1, 1] = -1
+
+ max_instance_id, max_segment_id = batch[i + j][2].max(
+ axis=0
+ )[1:]
+ offset_segment_id = offset_segment_id + max_segment_id + 1
+ offset_instance_id = (
+ offset_instance_id + max_instance_id + 1
+ )
+
+ if (len(batch_coordinates) == 2) and self.place_nearby:
+ border = batch_coordinates[0][:, 0].max()
+ border -= batch_coordinates[1][:, 0].min()
+ batch_coordinates[1][:, 0] += border
+ elif (len(batch_coordinates) == 2) and self.place_far:
+ batch_coordinates[1] += (
+ np.random.uniform((-10, -10, -10), (10, 10, 10)) * 200
+ )
+ new_batch.append(
+ (
+ np.vstack(batch_coordinates),
+ np.vstack(batch_features),
+ np.concatenate(batch_labels),
+ batch_filenames,
+ np.vstack(batch_raw_color),
+ np.vstack(batch_raw_normals),
+ )
+ )
+ # TODO WHAT ABOUT POINT2SEGMENT AND SO ON ...
+ batch = new_batch
+ elif ("train" in self.mode) and self.make_one_pc_noise:
+ new_batch = []
+ for i in range(0, len(batch), 2):
+ if (i + 1) < len(batch):
+ new_batch.append(
+ [
+ np.vstack((batch[i][0], batch[i + 1][0])),
+ np.vstack((batch[i][1], batch[i + 1][1])),
+ np.concatenate(
+ (
+ batch[i][2],
+ np.full_like(
+ batch[i + 1][2], self.ignore_label
+ ),
+ )
+ ),
+ ]
+ )
+ new_batch.append(
+ [
+ np.vstack((batch[i][0], batch[i + 1][0])),
+ np.vstack((batch[i][1], batch[i + 1][1])),
+ np.concatenate(
+ (
+ np.full_like(
+ batch[i][2], self.ignore_label
+ ),
+ batch[i + 1][2],
+ )
+ ),
+ ]
+ )
+ else:
+ new_batch.append([batch[i][0], batch[i][1], batch[i][2]])
+ batch = new_batch
+ # return voxelize(batch, self.ignore_label, self.voxel_size, self.probing, self.mode)
+ return voxelize(
+ batch,
+ self.ignore_label,
+ self.voxel_size,
+ self.probing,
+ self.mode,
+ task=self.task,
+ )
+
+
+def batch_instances(batch):
+ new_batch = []
+ for sample in batch:
+ for instance_id in np.unique(sample[2][:, 1]):
+ new_batch.append(
+ (
+ sample[0][sample[2][:, 1] == instance_id],
+ sample[1][sample[2][:, 1] == instance_id],
+ sample[2][sample[2][:, 1] == instance_id][:, 0],
+ ),
+ )
+ return new_batch
+
+
+def voxelize(
+ batch,
+ ignore_label,
+ voxel_size,
+ probing,
+ mode,
+ task,
+ ignore_class_threshold,
+ filter_out_classes,
+ label_offset,
+ num_queries,
+):
+ (
+ coordinates,
+ features,
+ labels,
+ original_labels,
+ inverse_maps,
+ original_colors,
+ original_normals,
+ original_coordinates,
+ idx,
+ ) = ([], [], [], [], [], [], [], [], [])
+ voxelization_dict = {
+ "ignore_label": ignore_label,
+ # "quantization_size": self.voxel_size,
+ "return_index": True,
+ "return_inverse": True,
+ }
+
+ full_res_coords = []
+
+ for sample in batch:
+ idx.append(sample[7])
+ original_coordinates.append(sample[6])
+ original_labels.append(sample[2])
+ full_res_coords.append(sample[0])
+ original_colors.append(sample[4])
+ original_normals.append(sample[5])
+
+ coords = np.floor(sample[0] / voxel_size)
+ voxelization_dict.update(
+ {
+ "coordinates": torch.from_numpy(coords).to("cpu").contiguous(),
+ "features": sample[1],
+ }
+ )
+
+ # maybe this change (_, _, ...) is not necessary and we can directly get out
+ # the sample coordinates?
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(
+ **voxelization_dict
+ )
+ inverse_maps.append(inverse_map)
+
+ sample_coordinates = coords[unique_map]
+ coordinates.append(torch.from_numpy(sample_coordinates).int())
+ sample_features = sample[1][unique_map]
+ features.append(torch.from_numpy(sample_features).float())
+ if len(sample[2]) > 0:
+ sample_labels = sample[2][unique_map]
+ labels.append(torch.from_numpy(sample_labels).long())
+
+ # Concatenate all lists
+ input_dict = {"coords": coordinates, "feats": features}
+ if len(labels) > 0:
+ input_dict["labels"] = labels
+ coordinates, features, labels = ME.utils.sparse_collate(**input_dict)
+ else:
+ coordinates, features = ME.utils.sparse_collate(**input_dict)
+ labels = torch.Tensor([])
+
+ if probing:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ ),
+ labels,
+ )
+
+ if mode == "test":
+ for i in range(len(input_dict["labels"])):
+ _, ret_index, ret_inv = np.unique(
+ input_dict["labels"][i][:, 0],
+ return_index=True,
+ return_inverse=True,
+ )
+ input_dict["labels"][i][:, 0] = torch.from_numpy(ret_inv)
+ # input_dict["segment2label"].append(input_dict["labels"][i][ret_index][:, :-1])
+ else:
+ input_dict["segment2label"] = []
+
+ if "labels" in input_dict:
+ for i in range(len(input_dict["labels"])):
+ # TODO BIGGER CHANGE CHECK!!!
+ _, ret_index, ret_inv = np.unique(
+ input_dict["labels"][i][:, -1],
+ return_index=True,
+ return_inverse=True,
+ )
+ input_dict["labels"][i][:, -1] = torch.from_numpy(ret_inv)
+ input_dict["segment2label"].append(
+ input_dict["labels"][i][ret_index][:, :-1]
+ )
+
+ if "labels" in input_dict:
+ list_labels = input_dict["labels"]
+
+ target = []
+ target_full = []
+
+ if len(list_labels[0].shape) == 1:
+ for batch_id in range(len(list_labels)):
+ label_ids = list_labels[batch_id].unique()
+ if 255 in label_ids:
+ label_ids = label_ids[:-1]
+
+ target.append(
+ {
+ "labels": label_ids,
+ "masks": list_labels[batch_id]
+ == label_ids.unsqueeze(1),
+ }
+ )
+ else:
+ if mode == "test":
+ for i in range(len(input_dict["labels"])):
+ target.append(
+ {"point2segment": input_dict["labels"][i][:, 0]}
+ )
+ target_full.append(
+ {
+ "point2segment": torch.from_numpy(
+ original_labels[i][:, 0]
+ ).long()
+ }
+ )
+ else:
+ target = get_instance_masks(
+ list_labels,
+ list_segments=input_dict["segment2label"],
+ task=task,
+ ignore_class_threshold=ignore_class_threshold,
+ filter_out_classes=filter_out_classes,
+ label_offset=label_offset,
+ )
+ for i in range(len(target)):
+ target[i]["point2segment"] = input_dict["labels"][i][:, 2]
+ if "train" not in mode:
+ target_full = get_instance_masks(
+ [torch.from_numpy(l) for l in original_labels],
+ task=task,
+ ignore_class_threshold=ignore_class_threshold,
+ filter_out_classes=filter_out_classes,
+ label_offset=label_offset,
+ )
+ for i in range(len(target_full)):
+ target_full[i]["point2segment"] = torch.from_numpy(
+ original_labels[i][:, 2]
+ ).long()
+ else:
+ target = []
+ target_full = []
+ coordinates = []
+ features = []
+
+ if "train" not in mode:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ full_res_coords,
+ target_full,
+ original_colors,
+ original_normals,
+ original_coordinates,
+ idx,
+ ),
+ target,
+ [sample[3] for sample in batch],
+ )
+ else:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ full_res_coords,
+ ),
+ target,
+ [sample[3] for sample in batch],
+ )
+
+
+def get_instance_masks(
+ list_labels,
+ task,
+ list_segments=None,
+ ignore_class_threshold=100,
+ filter_out_classes=[],
+ label_offset=0,
+):
+ target = []
+
+ for batch_id in range(len(list_labels)):
+ label_ids = []
+ masks = []
+ segment_masks = []
+ instance_ids = list_labels[batch_id][:, 1].unique()
+
+ for instance_id in instance_ids:
+ if instance_id == -1:
+ continue
+
+ # TODO is it possible that a ignore class (255) is an instance???
+ # instance == -1 ???
+ tmp = list_labels[batch_id][
+ list_labels[batch_id][:, 1] == instance_id
+ ]
+ label_id = tmp[0, 0]
+
+ if (
+ label_id in filter_out_classes
+ ): # floor, wall, undefined==255 is not included
+ continue
+
+ if (
+ 255 in filter_out_classes
+ and label_id.item() == 255
+ and tmp.shape[0] < ignore_class_threshold
+ ):
+ continue
+
+ label_ids.append(label_id)
+ masks.append(list_labels[batch_id][:, 1] == instance_id)
+
+ if list_segments:
+ segment_mask = torch.zeros(
+ list_segments[batch_id].shape[0]
+ ).bool()
+ segment_mask[
+ list_labels[batch_id][
+ list_labels[batch_id][:, 1] == instance_id
+ ][:, 2].unique()
+ ] = True
+ segment_masks.append(segment_mask)
+
+ if len(label_ids) == 0:
+ return list()
+
+ label_ids = torch.stack(label_ids)
+ masks = torch.stack(masks)
+ if list_segments:
+ segment_masks = torch.stack(segment_masks)
+
+ if task == "semantic_segmentation":
+ new_label_ids = []
+ new_masks = []
+ new_segment_masks = []
+ for label_id in label_ids.unique():
+ masking = label_ids == label_id
+
+ new_label_ids.append(label_id)
+ new_masks.append(masks[masking, :].sum(dim=0).bool())
+
+ if list_segments:
+ new_segment_masks.append(
+ segment_masks[masking, :].sum(dim=0).bool()
+ )
+
+ label_ids = torch.stack(new_label_ids)
+ masks = torch.stack(new_masks)
+
+ if list_segments:
+ segment_masks = torch.stack(new_segment_masks)
+
+ target.append(
+ {
+ "labels": label_ids,
+ "masks": masks,
+ "segment_mask": segment_masks,
+ }
+ )
+ else:
+ target.append({"labels": label_ids, "masks": masks})
+ else:
+ l = torch.clamp(label_ids - label_offset, min=0)
+
+ if list_segments:
+ target.append(
+ {
+ "labels": l,
+ "masks": masks,
+ "segment_mask": segment_masks,
+ }
+ )
+ else:
+ target.append({"labels": l, "masks": masks})
+ return target
+
+
+def make_crops(batch):
+ new_batch = []
+ # detupling
+ for scene in batch:
+ new_batch.append([scene[0], scene[1], scene[2]])
+ batch = new_batch
+ new_batch = []
+ for scene in batch:
+ # move to center for better quadrant split
+ scene[0][:, :3] -= scene[0][:, :3].mean(0)
+
+ # BUGFIX - there always would be a point in every quadrant
+ scene[0] = np.vstack(
+ (
+ scene[0],
+ np.array(
+ [
+ [0.1, 0.1, 0.1],
+ [0.1, -0.1, 0.1],
+ [-0.1, 0.1, 0.1],
+ [-0.1, -0.1, 0.1],
+ ]
+ ),
+ )
+ )
+ scene[1] = np.vstack((scene[1], np.zeros((4, scene[1].shape[1]))))
+ scene[2] = np.concatenate(
+ (scene[2], np.full_like((scene[2]), 255)[:4])
+ )
+
+ crop = scene[0][:, 0] > 0
+ crop &= scene[0][:, 1] > 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] > 0
+ crop &= scene[0][:, 1] < 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] < 0
+ crop &= scene[0][:, 1] > 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] < 0
+ crop &= scene[0][:, 1] < 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ # moving all of them to center
+ for i in range(len(new_batch)):
+ new_batch[i][0][:, :3] -= new_batch[i][0][:, :3].mean(0)
+ return new_batch
+
+
+class NoGpu:
+ def __init__(
+ self,
+ coordinates,
+ features,
+ original_labels=None,
+ inverse_maps=None,
+ full_res_coords=None,
+ target_full=None,
+ original_colors=None,
+ original_normals=None,
+ original_coordinates=None,
+ idx=None,
+ ):
+ """helper class to prevent gpu loading on lightning"""
+ self.coordinates = coordinates
+ self.features = features
+ self.original_labels = original_labels
+ self.inverse_maps = inverse_maps
+ self.full_res_coords = full_res_coords
+ self.target_full = target_full
+ self.original_colors = original_colors
+ self.original_normals = original_normals
+ self.original_coordinates = original_coordinates
+ self.idx = idx
+
+
+class NoGpuMask:
+ def __init__(
+ self,
+ coordinates,
+ features,
+ original_labels=None,
+ inverse_maps=None,
+ masks=None,
+ labels=None,
+ ):
+ """helper class to prevent gpu loading on lightning"""
+ self.coordinates = coordinates
+ self.features = features
+ self.original_labels = original_labels
+ self.inverse_maps = inverse_maps
+
+ self.masks = masks
+ self.labels = labels
diff --git a/models/Mask3D/build/lib/mask3d/main_instance_segmentation.py b/models/Mask3D/build/lib/mask3d/main_instance_segmentation.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2664673cb3a1fa16191e7baa82a50bbb8f5f195
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/main_instance_segmentation.py
@@ -0,0 +1,114 @@
+import logging
+import os
+from hashlib import md5
+from uuid import uuid4
+import hydra
+from dotenv import load_dotenv
+from omegaconf import DictConfig, OmegaConf
+from trainer.trainer import InstanceSegmentation, RegularCheckpointing
+from pytorch_lightning.callbacks import ModelCheckpoint
+from utils.utils import (
+ flatten_dict,
+ load_baseline_model,
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+from pytorch_lightning import Trainer, seed_everything
+
+
+def get_parameters(cfg: DictConfig):
+ logger = logging.getLogger(__name__)
+ load_dotenv(".env")
+
+ # parsing input parameters
+ seed_everything(cfg.general.seed)
+
+ # getting basic configuration
+ if cfg.general.get("gpus", None) is None:
+ cfg.general.gpus = os.environ.get("CUDA_VISIBLE_DEVICES", None)
+ loggers = []
+
+ # cfg.general.experiment_id = "0" # str(Repo("./").commit())[:8]
+ # params = flatten_dict(OmegaConf.to_container(cfg, resolve=True))
+
+ # create unique id for experiments that are run locally
+ # unique_id = "_" + str(uuid4())[:4]
+ # cfg.general.version = md5(str(params).encode("utf-8")).hexdigest()[:8] + unique_id
+
+ if not os.path.exists(cfg.general.save_dir):
+ os.makedirs(cfg.general.save_dir)
+ else:
+ print("EXPERIMENT ALREADY EXIST")
+ cfg["trainer"][
+ "resume_from_checkpoint"
+ ] = f"{cfg.general.save_dir}/last-epoch.ckpt"
+
+ for log in cfg.logging:
+ print(log)
+ # loggers.append(hydra.utils.instantiate(log))
+ # loggers[-1].log_hyperparams(
+ # flatten_dict(OmegaConf.to_container(cfg, resolve=True))
+ # )
+
+ model = InstanceSegmentation(cfg)
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ logger.info(flatten_dict(OmegaConf.to_container(cfg, resolve=True)))
+ return cfg, model, loggers
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def train(cfg: DictConfig):
+ os.chdir(hydra.utils.get_original_cwd())
+ cfg, model, loggers = get_parameters(cfg)
+ callbacks = []
+ for cb in cfg.callbacks:
+ callbacks.append(hydra.utils.instantiate(cb))
+
+ callbacks.append(RegularCheckpointing())
+
+ runner = Trainer(
+ logger=loggers,
+ gpus=cfg.general.gpus,
+ callbacks=callbacks,
+ weights_save_path=str(cfg.general.save_dir),
+ **cfg.trainer,
+ )
+ runner.fit(model)
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def test(cfg: DictConfig):
+ # because hydra wants to change dir for some reason
+ os.chdir(hydra.utils.get_original_cwd())
+ cfg, model, loggers = get_parameters(cfg)
+ runner = Trainer(
+ gpus=cfg.general.gpus,
+ logger=loggers,
+ weights_save_path=str(cfg.general.save_dir),
+ **cfg.trainer,
+ )
+ runner.test(model)
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def main(cfg: DictConfig):
+ if cfg["general"]["train_mode"]:
+ train(cfg)
+ else:
+ test(cfg)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/models/Mask3D/build/lib/mask3d/models/__init__.py b/models/Mask3D/build/lib/mask3d/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b092c965bba4c734b49a7f4d2e3ab6fee8471d17
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/__init__.py
@@ -0,0 +1,44 @@
+import mask3d.models.resunet as resunet
+import mask3d.models.res16unet as res16unet
+from mask3d.models.res16unet import (
+ Res16UNet34C,
+ Res16UNet34A,
+ Res16UNet14A,
+ Res16UNet34D,
+ Res16UNet18D,
+ Res16UNet18B,
+ Custom30M,
+)
+from mask3d.models.mask3d import Mask3D
+
+MODELS = []
+
+
+def add_models(module):
+ MODELS.extend([getattr(module, a) for a in dir(module) if "Net" in a])
+
+
+add_models(resunet)
+add_models(res16unet)
+add_models(mask3d)
+
+
+def get_models():
+ """Returns a tuple of sample models."""
+ return MODELS
+
+
+def load_model(name):
+ """Creates and returns an instance of the model given its class name."""
+ # Find the model class from its name
+ all_models = get_models()
+ mdict = {model.__name__: model for model in all_models}
+ if name not in mdict:
+ print("Invalid model index. Options are:")
+ # Display a list of valid model names
+ for model in all_models:
+ print(f"\t* {model.__name__}")
+ return None
+ NetClass = mdict[name]
+
+ return NetClass
diff --git a/models/Mask3D/build/lib/mask3d/models/criterion.py b/models/Mask3D/build/lib/mask3d/models/criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..19ce8bc8ecf4a0be08ce91e45857412a8d55efba
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/criterion.py
@@ -0,0 +1,343 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/models/detr.py
+# Modified for Mask3D
+"""
+MaskFormer criterion.
+"""
+
+import torch
+import torch.nn.functional as F
+from torch import nn
+
+from detectron2.utils.comm import get_world_size
+from detectron2.projects.point_rend.point_features import (
+ get_uncertain_point_coords_with_randomness,
+ point_sample,
+)
+
+from mask3d.models.misc import (
+ is_dist_avail_and_initialized,
+ nested_tensor_from_tensor_list,
+)
+
+
+def dice_loss(
+ inputs: torch.Tensor,
+ targets: torch.Tensor,
+ num_masks: float,
+):
+ """
+ Compute the DICE loss, similar to generalized IOU for masks
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ """
+ inputs = inputs.sigmoid()
+ inputs = inputs.flatten(1)
+ numerator = 2 * (inputs * targets).sum(-1)
+ denominator = inputs.sum(-1) + targets.sum(-1)
+ loss = 1 - (numerator + 1) / (denominator + 1)
+ return loss.sum() / num_masks
+
+
+dice_loss_jit = torch.jit.script(dice_loss) # type: torch.jit.ScriptModule
+
+
+def sigmoid_ce_loss(
+ inputs: torch.Tensor,
+ targets: torch.Tensor,
+ num_masks: float,
+):
+ """
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ Returns:
+ Loss tensor
+ """
+ loss = F.binary_cross_entropy_with_logits(
+ inputs, targets, reduction="none"
+ )
+
+ return loss.mean(1).sum() / num_masks
+
+
+sigmoid_ce_loss_jit = torch.jit.script(
+ sigmoid_ce_loss
+) # type: torch.jit.ScriptModule
+
+
+def calculate_uncertainty(logits):
+ """
+ We estimate uncerainty as L1 distance between 0.0 and the logit prediction in 'logits' for the
+ foreground class in `classes`.
+ Args:
+ logits (Tensor): A tensor of shape (R, 1, ...) for class-specific or
+ class-agnostic, where R is the total number of predicted masks in all images and C is
+ the number of foreground classes. The values are logits.
+ Returns:
+ scores (Tensor): A tensor of shape (R, 1, ...) that contains uncertainty scores with
+ the most uncertain locations having the highest uncertainty score.
+ """
+ assert logits.shape[1] == 1
+ gt_class_logits = logits.clone()
+ return -(torch.abs(gt_class_logits))
+
+
+class SetCriterion(nn.Module):
+ """This class computes the loss for DETR.
+ The process happens in two steps:
+ 1) we compute hungarian assignment between ground truth boxes and the outputs of the model
+ 2) we supervise each pair of matched ground-truth / prediction (supervise class and box)
+ """
+
+ def __init__(
+ self,
+ num_classes,
+ matcher,
+ weight_dict,
+ eos_coef,
+ losses,
+ num_points,
+ oversample_ratio,
+ importance_sample_ratio,
+ class_weights,
+ ):
+ """Create the criterion.
+ Parameters:
+ num_classes: number of object categories, omitting the special no-object category
+ matcher: module able to compute a matching between targets and proposals
+ weight_dict: dict containing as key the names of the losses and as values their relative weight.
+ eos_coef: relative classification weight applied to the no-object category
+ losses: list of all the losses to be applied. See get_loss for list of available losses.
+ """
+ super().__init__()
+ self.num_classes = num_classes - 1
+ self.class_weights = class_weights
+ self.matcher = matcher
+ self.weight_dict = weight_dict
+ self.eos_coef = eos_coef
+ self.losses = losses
+ empty_weight = torch.ones(self.num_classes + 1)
+ empty_weight[-1] = self.eos_coef
+
+ if self.class_weights != -1:
+ assert (
+ len(self.class_weights) == self.num_classes
+ ), "CLASS WEIGHTS DO NOT MATCH"
+ empty_weight[:-1] = torch.tensor(self.class_weights)
+
+ self.register_buffer("empty_weight", empty_weight)
+
+ # pointwise mask loss parameters
+ self.num_points = num_points
+ self.oversample_ratio = oversample_ratio
+ self.importance_sample_ratio = importance_sample_ratio
+
+ def loss_labels(self, outputs, targets, indices, num_masks, mask_type):
+ """Classification loss (NLL)
+ targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
+ """
+ assert "pred_logits" in outputs
+ src_logits = outputs["pred_logits"].float()
+
+ idx = self._get_src_permutation_idx(indices)
+ target_classes_o = torch.cat(
+ [t["labels"][J] for t, (_, J) in zip(targets, indices)]
+ )
+ target_classes = torch.full(
+ src_logits.shape[:2],
+ self.num_classes,
+ dtype=torch.int64,
+ device=src_logits.device,
+ )
+ target_classes[idx] = target_classes_o
+
+ loss_ce = F.cross_entropy(
+ src_logits.transpose(1, 2),
+ target_classes,
+ self.empty_weight,
+ ignore_index=253,
+ )
+ losses = {"loss_ce": loss_ce}
+ return losses
+
+ def loss_masks(self, outputs, targets, indices, num_masks, mask_type):
+ """Compute the losses related to the masks: the focal loss and the dice loss.
+ targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
+ """
+ assert "pred_masks" in outputs
+
+ loss_masks = []
+ loss_dices = []
+
+ for batch_id, (map_id, target_id) in enumerate(indices):
+ map = outputs["pred_masks"][batch_id][:, map_id].T
+ target_mask = targets[batch_id][mask_type][target_id]
+
+ if self.num_points != -1:
+ point_idx = torch.randperm(
+ target_mask.shape[1], device=target_mask.device
+ )[: int(self.num_points * target_mask.shape[1])]
+ else:
+ # sample all points
+ point_idx = torch.arange(
+ target_mask.shape[1], device=target_mask.device
+ )
+
+ num_masks = target_mask.shape[0]
+ map = map[:, point_idx]
+ target_mask = target_mask[:, point_idx].float()
+
+ loss_masks.append(sigmoid_ce_loss_jit(map, target_mask, num_masks))
+ loss_dices.append(dice_loss_jit(map, target_mask, num_masks))
+ # del target_mask
+ return {
+ "loss_mask": torch.sum(torch.stack(loss_masks)),
+ "loss_dice": torch.sum(torch.stack(loss_dices)),
+ }
+
+ src_idx = self._get_src_permutation_idx(indices)
+ tgt_idx = self._get_tgt_permutation_idx(indices)
+ src_masks = outputs["pred_masks"]
+ src_masks = src_masks[src_idx]
+ masks = [t[mask_type] for t in targets]
+ # TODO use valid to mask invalid areas due to padding in loss
+ target_masks, valid = nested_tensor_from_tensor_list(masks).decompose()
+ target_masks = target_masks.to(src_masks)
+ target_masks = target_masks[tgt_idx]
+
+ # No need to upsample predictions as we are using normalized coordinates :)
+ # N x 1 x H x W
+ src_masks = src_masks[:, None]
+ target_masks = target_masks[:, None]
+
+ with torch.no_grad():
+ # sample point_coords
+ point_coords = get_uncertain_point_coords_with_randomness(
+ src_masks,
+ lambda logits: calculate_uncertainty(logits),
+ self.num_points,
+ self.oversample_ratio,
+ self.importance_sample_ratio,
+ )
+ # get gt labels
+ point_labels = point_sample(
+ target_masks,
+ point_coords,
+ align_corners=False,
+ ).squeeze(1)
+
+ point_logits = point_sample(
+ src_masks,
+ point_coords,
+ align_corners=False,
+ ).squeeze(1)
+
+ losses = {
+ "loss_mask": sigmoid_ce_loss_jit(
+ point_logits, point_labels, num_masks, mask_type
+ ),
+ "loss_dice": dice_loss_jit(
+ point_logits, point_labels, num_masks, mask_type
+ ),
+ }
+
+ del src_masks
+ del target_masks
+ return losses
+
+ def _get_src_permutation_idx(self, indices):
+ # permute predictions following indices
+ batch_idx = torch.cat(
+ [torch.full_like(src, i) for i, (src, _) in enumerate(indices)]
+ )
+ src_idx = torch.cat([src for (src, _) in indices])
+ return batch_idx, src_idx
+
+ def _get_tgt_permutation_idx(self, indices):
+ # permute targets following indices
+ batch_idx = torch.cat(
+ [torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)]
+ )
+ tgt_idx = torch.cat([tgt for (_, tgt) in indices])
+ return batch_idx, tgt_idx
+
+ def get_loss(self, loss, outputs, targets, indices, num_masks, mask_type):
+ loss_map = {"labels": self.loss_labels, "masks": self.loss_masks}
+ assert loss in loss_map, f"do you really want to compute {loss} loss?"
+ return loss_map[loss](outputs, targets, indices, num_masks, mask_type)
+
+ def forward(self, outputs, targets, mask_type):
+ """This performs the loss computation.
+ Parameters:
+ outputs: dict of tensors, see the output specification of the model for the format
+ targets: list of dicts, such that len(targets) == batch_size.
+ The expected keys in each dict depends on the losses applied, see each loss' doc
+ """
+ outputs_without_aux = {
+ k: v for k, v in outputs.items() if k != "aux_outputs"
+ }
+
+ # Retrieve the matching between the outputs of the last layer and the targets
+ indices = self.matcher(outputs_without_aux, targets, mask_type)
+
+ # Compute the average number of target boxes accross all nodes, for normalization purposes
+ num_masks = sum(len(t["labels"]) for t in targets)
+ num_masks = torch.as_tensor(
+ [num_masks],
+ dtype=torch.float,
+ device=next(iter(outputs.values())).device,
+ )
+ if is_dist_avail_and_initialized():
+ torch.distributed.all_reduce(num_masks)
+ num_masks = torch.clamp(num_masks / get_world_size(), min=1).item()
+
+ # Compute all the requested losses
+ losses = {}
+ for loss in self.losses:
+ losses.update(
+ self.get_loss(
+ loss, outputs, targets, indices, num_masks, mask_type
+ )
+ )
+
+ # In case of auxiliary losses, we repeat this process with the output of each intermediate layer.
+ if "aux_outputs" in outputs:
+ for i, aux_outputs in enumerate(outputs["aux_outputs"]):
+ indices = self.matcher(aux_outputs, targets, mask_type)
+ for loss in self.losses:
+ l_dict = self.get_loss(
+ loss,
+ aux_outputs,
+ targets,
+ indices,
+ num_masks,
+ mask_type,
+ )
+ l_dict = {k + f"_{i}": v for k, v in l_dict.items()}
+ losses.update(l_dict)
+
+ return losses
+
+ def __repr__(self):
+ head = "Criterion " + self.__class__.__name__
+ body = [
+ "matcher: {}".format(self.matcher.__repr__(_repr_indent=8)),
+ "losses: {}".format(self.losses),
+ "weight_dict: {}".format(self.weight_dict),
+ "num_classes: {}".format(self.num_classes),
+ "eos_coef: {}".format(self.eos_coef),
+ "num_points: {}".format(self.num_points),
+ "oversample_ratio: {}".format(self.oversample_ratio),
+ "importance_sample_ratio: {}".format(self.importance_sample_ratio),
+ ]
+ _repr_indent = 4
+ lines = [head] + [" " * _repr_indent + line for line in body]
+ return "\n".join(lines)
diff --git a/models/Mask3D/build/lib/mask3d/models/mask3d.py b/models/Mask3D/build/lib/mask3d/models/mask3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..0e09440cfacc68a961af8231f8205bf1daf6a134
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/mask3d.py
@@ -0,0 +1,870 @@
+import torch
+import hydra
+import torch.nn as nn
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine.MinkowskiPooling import MinkowskiAvgPooling
+import numpy as np
+from torch.nn import functional as F
+from mask3d.models.modules.common import conv
+from mask3d.models.position_embedding import PositionEmbeddingCoordsSine
+from mask3d.models.modules.helpers_3detr import GenericMLP
+from torch_scatter import scatter_mean, scatter_max, scatter_min
+from torch.cuda.amp import autocast
+
+from pointnet2.pointnet2_utils import furthest_point_sample
+
+
+class Mask3D(nn.Module):
+ def __init__(
+ self,
+ config,
+ hidden_dim,
+ num_queries,
+ num_heads,
+ dim_feedforward,
+ sample_sizes,
+ shared_decoder,
+ num_classes,
+ num_decoders,
+ dropout,
+ pre_norm,
+ positional_encoding_type,
+ non_parametric_queries,
+ train_on_segments,
+ normalize_pos_enc,
+ use_level_embed,
+ scatter_type,
+ hlevels,
+ use_np_features,
+ voxel_size,
+ max_sample_size,
+ random_queries,
+ gauss_scale,
+ random_query_both,
+ random_normal,
+ ):
+ super().__init__()
+ self.random_normal = random_normal
+ self.random_query_both = random_query_both
+ self.random_queries = random_queries
+ self.max_sample_size = max_sample_size
+ self.gauss_scale = gauss_scale
+ self.voxel_size = voxel_size
+ self.scatter_type = scatter_type
+ self.hlevels = hlevels
+ self.use_level_embed = use_level_embed
+ self.train_on_segments = train_on_segments
+ self.normalize_pos_enc = normalize_pos_enc
+ self.num_decoders = num_decoders
+ self.num_classes = num_classes
+ self.dropout = dropout
+ self.pre_norm = pre_norm
+ self.shared_decoder = shared_decoder
+ self.sample_sizes = sample_sizes
+ self.non_parametric_queries = non_parametric_queries
+ self.use_np_features = use_np_features
+ self.mask_dim = hidden_dim
+ self.num_heads = num_heads
+ self.num_queries = num_queries
+ self.pos_enc_type = positional_encoding_type
+
+ self.backbone = hydra.utils.instantiate(config.backbone)
+ self.num_levels = len(self.hlevels)
+ sizes = self.backbone.PLANES[-5:]
+
+ self.mask_features_head = conv(
+ self.backbone.PLANES[7],
+ self.mask_dim,
+ kernel_size=1,
+ stride=1,
+ bias=True,
+ D=3,
+ )
+
+ if self.scatter_type == "mean":
+ self.scatter_fn = scatter_mean
+ elif self.scatter_type == "max":
+ self.scatter_fn = lambda mask, p2s, dim: scatter_max(
+ mask, p2s, dim=dim
+ )[0]
+ else:
+ assert False, "Scatter function not known"
+
+ assert (
+ not use_np_features
+ ) or non_parametric_queries, "np features only with np queries"
+
+ if self.non_parametric_queries:
+ self.query_projection = GenericMLP(
+ input_dim=self.mask_dim,
+ hidden_dims=[self.mask_dim],
+ output_dim=self.mask_dim,
+ use_conv=True,
+ output_use_activation=True,
+ hidden_use_bias=True,
+ )
+
+ if self.use_np_features:
+ self.np_feature_projection = nn.Sequential(
+ nn.Linear(sizes[-1], hidden_dim),
+ nn.ReLU(),
+ nn.Linear(hidden_dim, hidden_dim),
+ )
+ elif self.random_query_both:
+ self.query_projection = GenericMLP(
+ input_dim=2 * self.mask_dim,
+ hidden_dims=[2 * self.mask_dim],
+ output_dim=2 * self.mask_dim,
+ use_conv=True,
+ output_use_activation=True,
+ hidden_use_bias=True,
+ )
+ else:
+ # PARAMETRIC QUERIES
+ # learnable query features
+ self.query_feat = nn.Embedding(num_queries, hidden_dim)
+ # learnable query p.e.
+ self.query_pos = nn.Embedding(num_queries, hidden_dim)
+
+ if self.use_level_embed:
+ # learnable scale-level embedding
+ self.level_embed = nn.Embedding(self.num_levels, hidden_dim)
+
+ self.mask_embed_head = nn.Sequential(
+ nn.Linear(hidden_dim, hidden_dim),
+ nn.ReLU(),
+ nn.Linear(hidden_dim, hidden_dim),
+ )
+
+ self.class_embed_head = nn.Linear(hidden_dim, self.num_classes)
+
+ if self.pos_enc_type == "legacy":
+ self.pos_enc = PositionalEncoding3D(channels=self.mask_dim)
+ elif self.pos_enc_type == "fourier":
+ self.pos_enc = PositionEmbeddingCoordsSine(
+ pos_type="fourier",
+ d_pos=self.mask_dim,
+ gauss_scale=self.gauss_scale,
+ normalize=self.normalize_pos_enc,
+ )
+ elif self.pos_enc_type == "sine":
+ self.pos_enc = PositionEmbeddingCoordsSine(
+ pos_type="sine",
+ d_pos=self.mask_dim,
+ normalize=self.normalize_pos_enc,
+ )
+ else:
+ assert False, "pos enc type not known"
+
+ self.pooling = MinkowskiAvgPooling(
+ kernel_size=2, stride=2, dimension=3
+ )
+
+ self.masked_transformer_decoder = nn.ModuleList()
+ self.cross_attention = nn.ModuleList()
+ self.self_attention = nn.ModuleList()
+ self.ffn_attention = nn.ModuleList()
+ self.lin_squeeze = nn.ModuleList()
+
+ num_shared = self.num_decoders if not self.shared_decoder else 1
+
+ for _ in range(num_shared):
+ tmp_cross_attention = nn.ModuleList()
+ tmp_self_attention = nn.ModuleList()
+ tmp_ffn_attention = nn.ModuleList()
+ tmp_squeeze_attention = nn.ModuleList()
+ for i, hlevel in enumerate(self.hlevels):
+ tmp_cross_attention.append(
+ CrossAttentionLayer(
+ d_model=self.mask_dim,
+ nhead=self.num_heads,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ tmp_squeeze_attention.append(
+ nn.Linear(sizes[hlevel], self.mask_dim)
+ )
+
+ tmp_self_attention.append(
+ SelfAttentionLayer(
+ d_model=self.mask_dim,
+ nhead=self.num_heads,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ tmp_ffn_attention.append(
+ FFNLayer(
+ d_model=self.mask_dim,
+ dim_feedforward=dim_feedforward,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ self.cross_attention.append(tmp_cross_attention)
+ self.self_attention.append(tmp_self_attention)
+ self.ffn_attention.append(tmp_ffn_attention)
+ self.lin_squeeze.append(tmp_squeeze_attention)
+
+ self.decoder_norm = nn.LayerNorm(hidden_dim)
+
+ def get_pos_encs(self, coords):
+ pos_encodings_pcd = []
+
+ for i in range(len(coords)):
+ pos_encodings_pcd.append([[]])
+ for coords_batch in coords[i].decomposed_features:
+ scene_min = coords_batch.min(dim=0)[0][None, ...]
+ scene_max = coords_batch.max(dim=0)[0][None, ...]
+
+ with autocast(enabled=False):
+ tmp = self.pos_enc(
+ coords_batch[None, ...].float(),
+ input_range=[scene_min, scene_max],
+ )
+
+ pos_encodings_pcd[-1][0].append(tmp.squeeze(0).permute((1, 0)))
+
+ return pos_encodings_pcd
+
+ def forward(
+ self, x, point2segment=None, raw_coordinates=None, is_eval=False
+ ):
+ # print(point2segment)
+ pcd_features, aux = self.backbone(x)
+
+ batch_size = len(x.decomposed_coordinates)
+
+ with torch.no_grad():
+ coordinates = me.SparseTensor(
+ features=raw_coordinates,
+ coordinate_manager=aux[-1].coordinate_manager,
+ coordinate_map_key=aux[-1].coordinate_map_key,
+ device=aux[-1].device,
+ )
+
+ coords = [coordinates]
+ for _ in reversed(range(len(aux) - 1)):
+ coords.append(self.pooling(coords[-1]))
+
+ coords.reverse()
+
+ pos_encodings_pcd = self.get_pos_encs(coords)
+ mask_features = self.mask_features_head(pcd_features)
+ if point2segment is not None:
+ mask_segments = []
+ for i, mask_feature in enumerate(
+ mask_features.decomposed_features
+ ):
+ mask_segments.append(
+ self.scatter_fn(mask_feature, point2segment[i], dim=0)
+ )
+
+ sampled_coords = None
+
+ if self.non_parametric_queries:
+ fps_idx = [
+ furthest_point_sample(
+ x.decomposed_coordinates[i][None, ...].float(),
+ self.num_queries,
+ )
+ .squeeze(0)
+ .long()
+ for i in range(len(x.decomposed_coordinates))
+ ]
+
+ sampled_coords = torch.stack(
+ [
+ coordinates.decomposed_features[i][fps_idx[i].long(), :]
+ for i in range(len(fps_idx))
+ ]
+ )
+
+ mins = torch.stack(
+ [
+ coordinates.decomposed_features[i].min(dim=0)[0]
+ for i in range(len(coordinates.decomposed_features))
+ ]
+ )
+ maxs = torch.stack(
+ [
+ coordinates.decomposed_features[i].max(dim=0)[0]
+ for i in range(len(coordinates.decomposed_features))
+ ]
+ )
+
+ query_pos = self.pos_enc(
+ sampled_coords.float(), input_range=[mins, maxs]
+ ) # Batch, Dim, queries
+ query_pos = self.query_projection(query_pos)
+
+ if not self.use_np_features:
+ queries = torch.zeros_like(query_pos).permute((0, 2, 1))
+ else:
+ queries = torch.stack(
+ [
+ pcd_features.decomposed_features[i][
+ fps_idx[i].long(), :
+ ]
+ for i in range(len(fps_idx))
+ ]
+ )
+ queries = self.np_feature_projection(queries)
+ query_pos = query_pos.permute((2, 0, 1))
+ elif self.random_queries:
+ query_pos = (
+ torch.rand(
+ batch_size,
+ self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+ - 0.5
+ )
+
+ queries = torch.zeros_like(query_pos).permute((0, 2, 1))
+ query_pos = query_pos.permute((2, 0, 1))
+ elif self.random_query_both:
+ if not self.random_normal:
+ query_pos_feat = (
+ torch.rand(
+ batch_size,
+ 2 * self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+ - 0.5
+ )
+ else:
+ query_pos_feat = torch.randn(
+ batch_size,
+ 2 * self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+
+ queries = query_pos_feat[:, : self.mask_dim, :].permute((0, 2, 1))
+ query_pos = query_pos_feat[:, self.mask_dim :, :].permute(
+ (2, 0, 1)
+ )
+ else:
+ # PARAMETRIC QUERIES
+ queries = self.query_feat.weight.unsqueeze(0).repeat(
+ batch_size, 1, 1
+ )
+ query_pos = self.query_pos.weight.unsqueeze(1).repeat(
+ 1, batch_size, 1
+ )
+
+ predictions_class = []
+ predictions_mask = []
+
+ for decoder_counter in range(self.num_decoders):
+ if self.shared_decoder:
+ decoder_counter = 0
+ for i, hlevel in enumerate(self.hlevels):
+ if point2segment is not None:
+ output_class, outputs_mask, attn_mask = self.mask_module(
+ queries,
+ mask_features,
+ mask_segments,
+ len(aux) - hlevel - 1,
+ ret_attn_mask=True,
+ point2segment=point2segment,
+ coords=coords,
+ )
+ else:
+ output_class, outputs_mask, attn_mask = self.mask_module(
+ queries,
+ mask_features,
+ None,
+ len(aux) - hlevel - 1,
+ ret_attn_mask=True,
+ point2segment=None,
+ coords=coords,
+ )
+
+ decomposed_aux = aux[hlevel].decomposed_features
+ decomposed_attn = attn_mask.decomposed_features
+
+ curr_sample_size = max(
+ [pcd.shape[0] for pcd in decomposed_aux]
+ )
+
+ if min([pcd.shape[0] for pcd in decomposed_aux]) == 1:
+ raise RuntimeError(
+ "only a single point gives nans in cross-attention"
+ )
+
+ if not (self.max_sample_size or is_eval):
+ curr_sample_size = min(
+ curr_sample_size, self.sample_sizes[hlevel]
+ )
+
+ rand_idx = []
+ mask_idx = []
+ for k in range(len(decomposed_aux)):
+ pcd_size = decomposed_aux[k].shape[0]
+ if pcd_size <= curr_sample_size:
+ # we do not need to sample
+ # take all points and pad the rest with zeroes and mask it
+ idx = torch.zeros(
+ curr_sample_size,
+ dtype=torch.long,
+ device=queries.device,
+ )
+
+ midx = torch.ones(
+ curr_sample_size,
+ dtype=torch.bool,
+ device=queries.device,
+ )
+
+ idx[:pcd_size] = torch.arange(
+ pcd_size, device=queries.device
+ )
+
+ midx[:pcd_size] = False # attend to first points
+ else:
+ # we have more points in pcd as we like to sample
+ # take a subset (no padding or masking needed)
+ idx = torch.randperm(
+ decomposed_aux[k].shape[0], device=queries.device
+ )[:curr_sample_size]
+ midx = torch.zeros(
+ curr_sample_size,
+ dtype=torch.bool,
+ device=queries.device,
+ ) # attend to all
+
+ rand_idx.append(idx)
+ mask_idx.append(midx)
+
+ batched_aux = torch.stack(
+ [
+ decomposed_aux[k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_attn = torch.stack(
+ [
+ decomposed_attn[k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_pos_enc = torch.stack(
+ [
+ pos_encodings_pcd[hlevel][0][k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_attn.permute((0, 2, 1))[
+ batched_attn.sum(1) == rand_idx[0].shape[0]
+ ] = False
+
+ m = torch.stack(mask_idx)
+ batched_attn = torch.logical_or(batched_attn, m[..., None])
+
+ src_pcd = self.lin_squeeze[decoder_counter][i](
+ batched_aux.permute((1, 0, 2))
+ )
+ if self.use_level_embed:
+ src_pcd += self.level_embed.weight[i]
+
+ output = self.cross_attention[decoder_counter][i](
+ queries.permute((1, 0, 2)),
+ src_pcd,
+ memory_mask=batched_attn.repeat_interleave(
+ self.num_heads, dim=0
+ ).permute((0, 2, 1)),
+ memory_key_padding_mask=None, # here we do not apply masking on padded region
+ pos=batched_pos_enc.permute((1, 0, 2)),
+ query_pos=query_pos,
+ )
+
+ output = self.self_attention[decoder_counter][i](
+ output,
+ tgt_mask=None,
+ tgt_key_padding_mask=None,
+ query_pos=query_pos,
+ )
+
+ # FFN
+ queries = self.ffn_attention[decoder_counter][i](
+ output
+ ).permute((1, 0, 2))
+
+ predictions_class.append(output_class)
+ predictions_mask.append(outputs_mask)
+
+ if point2segment is not None:
+ output_class, outputs_mask = self.mask_module(
+ queries,
+ mask_features,
+ mask_segments,
+ 0,
+ ret_attn_mask=False,
+ point2segment=point2segment,
+ coords=coords,
+ )
+ else:
+ output_class, outputs_mask = self.mask_module(
+ queries,
+ mask_features,
+ None,
+ 0,
+ ret_attn_mask=False,
+ point2segment=None,
+ coords=coords,
+ )
+ predictions_class.append(output_class)
+ predictions_mask.append(outputs_mask)
+
+ return {
+ "pred_logits": predictions_class[-1],
+ "pred_masks": predictions_mask[-1],
+ "aux_outputs": self._set_aux_loss(
+ predictions_class, predictions_mask
+ ),
+ "sampled_coords": sampled_coords.detach().cpu().numpy()
+ if sampled_coords is not None
+ else None,
+ "backbone_features": pcd_features,
+ }
+
+ def mask_module(
+ self,
+ query_feat,
+ mask_features,
+ mask_segments,
+ num_pooling_steps,
+ ret_attn_mask=True,
+ point2segment=None,
+ coords=None,
+ ):
+ query_feat = self.decoder_norm(query_feat)
+ mask_embed = self.mask_embed_head(query_feat)
+ outputs_class = self.class_embed_head(query_feat)
+
+ output_masks = []
+
+ if point2segment is not None:
+ output_segments = []
+ for i in range(len(mask_segments)):
+ output_segments.append(mask_segments[i] @ mask_embed[i].T)
+ output_masks.append(output_segments[-1][point2segment[i]])
+ else:
+ for i in range(mask_features.C[-1, 0] + 1):
+ output_masks.append(
+ mask_features.decomposed_features[i] @ mask_embed[i].T
+ )
+
+ output_masks = torch.cat(output_masks)
+ outputs_mask = me.SparseTensor(
+ features=output_masks,
+ coordinate_manager=mask_features.coordinate_manager,
+ coordinate_map_key=mask_features.coordinate_map_key,
+ )
+
+ if ret_attn_mask:
+ attn_mask = outputs_mask
+ for _ in range(num_pooling_steps):
+ attn_mask = self.pooling(attn_mask.float())
+
+ attn_mask = me.SparseTensor(
+ features=(attn_mask.F.detach().sigmoid() < 0.5),
+ coordinate_manager=attn_mask.coordinate_manager,
+ coordinate_map_key=attn_mask.coordinate_map_key,
+ )
+
+ if point2segment is not None:
+ return outputs_class, output_segments, attn_mask
+ else:
+ return (
+ outputs_class,
+ outputs_mask.decomposed_features,
+ attn_mask,
+ )
+
+ if point2segment is not None:
+ return outputs_class, output_segments
+ else:
+ return outputs_class, outputs_mask.decomposed_features
+
+ @torch.jit.unused
+ def _set_aux_loss(self, outputs_class, outputs_seg_masks):
+ # this is a workaround to make torchscript happy, as torchscript
+ # doesn't support dictionary with non-homogeneous values, such
+ # as a dict having both a Tensor and a list.
+ return [
+ {"pred_logits": a, "pred_masks": b}
+ for a, b in zip(outputs_class[:-1], outputs_seg_masks[:-1])
+ ]
+
+
+class PositionalEncoding3D(nn.Module):
+ def __init__(self, channels):
+ """
+ :param channels: The last dimension of the tensor you want to apply pos emb to.
+ """
+ self.orig_ch = channels
+ super(PositionalEncoding3D, self).__init__()
+ channels = int(np.ceil(channels / 6) * 2)
+ if channels % 2:
+ channels += 1
+ self.channels = channels
+ inv_freq = 1.0 / (
+ 10000 ** (torch.arange(0, channels, 2).float() / channels)
+ )
+ self.register_buffer("inv_freq", inv_freq)
+
+ def forward(self, tensor, input_range=None):
+ """
+ :param tensor: A 5d tensor of size (batch_size, x, y, z, ch)
+ :return: Positional Encoding Matrix of size (batch_size, x, y, z, ch)
+ """
+ pos_x, pos_y, pos_z = tensor[:, :, 0], tensor[:, :, 1], tensor[:, :, 2]
+ sin_inp_x = torch.einsum("bi,j->bij", pos_x, self.inv_freq)
+ sin_inp_y = torch.einsum("bi,j->bij", pos_y, self.inv_freq)
+ sin_inp_z = torch.einsum("bi,j->bij", pos_z, self.inv_freq)
+ emb_x = torch.cat((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
+
+ emb_y = torch.cat((sin_inp_y.sin(), sin_inp_y.cos()), dim=-1)
+ emb_z = torch.cat((sin_inp_z.sin(), sin_inp_z.cos()), dim=-1)
+
+ emb = torch.cat((emb_x, emb_y, emb_z), dim=-1)
+ return emb[:, :, : self.orig_ch].permute((0, 2, 1))
+
+
+class SelfAttentionLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ nhead,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+
+ self.norm = nn.LayerNorm(d_model)
+ self.dropout = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ q = k = self.with_pos_embed(tgt, query_pos)
+ tgt2 = self.self_attn(
+ q,
+ k,
+ value=tgt,
+ attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+
+ return tgt
+
+ def forward_pre(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ tgt2 = self.norm(tgt)
+ q = k = self.with_pos_embed(tgt2, query_pos)
+ tgt2 = self.self_attn(
+ q,
+ k,
+ value=tgt2,
+ attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+
+ return tgt
+
+ def forward(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ if self.normalize_before:
+ return self.forward_pre(
+ tgt, tgt_mask, tgt_key_padding_mask, query_pos
+ )
+ return self.forward_post(
+ tgt, tgt_mask, tgt_key_padding_mask, query_pos
+ )
+
+
+class CrossAttentionLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ nhead,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ self.multihead_attn = nn.MultiheadAttention(
+ d_model, nhead, dropout=dropout
+ )
+
+ self.norm = nn.LayerNorm(d_model)
+ self.dropout = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ tgt2 = self.multihead_attn(
+ query=self.with_pos_embed(tgt, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory,
+ attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+
+ return tgt
+
+ def forward_pre(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ tgt2 = self.norm(tgt)
+
+ tgt2 = self.multihead_attn(
+ query=self.with_pos_embed(tgt2, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory,
+ attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+
+ return tgt
+
+ def forward(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ if self.normalize_before:
+ return self.forward_pre(
+ tgt,
+ memory,
+ memory_mask,
+ memory_key_padding_mask,
+ pos,
+ query_pos,
+ )
+ return self.forward_post(
+ tgt, memory, memory_mask, memory_key_padding_mask, pos, query_pos
+ )
+
+
+class FFNLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ dim_feedforward=2048,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ # Implementation of Feedforward model
+ self.linear1 = nn.Linear(d_model, dim_feedforward)
+ self.dropout = nn.Dropout(dropout)
+ self.linear2 = nn.Linear(dim_feedforward, d_model)
+
+ self.norm = nn.LayerNorm(d_model)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(self, tgt):
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+ return tgt
+
+ def forward_pre(self, tgt):
+ tgt2 = self.norm(tgt)
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
+ tgt = tgt + self.dropout(tgt2)
+ return tgt
+
+ def forward(self, tgt):
+ if self.normalize_before:
+ return self.forward_pre(tgt)
+ return self.forward_post(tgt)
+
+
+def _get_activation_fn(activation):
+ """Return an activation function given a string"""
+ if activation == "relu":
+ return F.relu
+ if activation == "gelu":
+ return F.gelu
+ if activation == "glu":
+ return F.glu
+ raise RuntimeError(f"activation should be relu/gelu, not {activation}.")
diff --git a/models/Mask3D/build/lib/mask3d/models/matcher.py b/models/Mask3D/build/lib/mask3d/models/matcher.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc0e7a05bb76a078b1c3c3b9c877054e439b584c
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/matcher.py
@@ -0,0 +1,226 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/models/matcher.py
+"""
+Modules to compute the matching cost and solve the corresponding LSAP.
+"""
+import torch
+import torch.nn.functional as F
+from scipy.optimize import linear_sum_assignment
+from torch import nn
+from torch.cuda.amp import autocast
+
+from detectron2.projects.point_rend.point_features import point_sample
+
+
+def batch_dice_loss(inputs: torch.Tensor, targets: torch.Tensor):
+ """
+ Compute the DICE loss, similar to generalized IOU for masks
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ """
+ inputs = inputs.sigmoid()
+ inputs = inputs.flatten(1)
+ numerator = 2 * torch.einsum("nc,mc->nm", inputs, targets)
+ denominator = inputs.sum(-1)[:, None] + targets.sum(-1)[None, :]
+ loss = 1 - (numerator + 1) / (denominator + 1)
+ return loss
+
+
+batch_dice_loss_jit = torch.jit.script(
+ batch_dice_loss
+) # type: torch.jit.ScriptModule
+
+
+def batch_sigmoid_ce_loss(inputs: torch.Tensor, targets: torch.Tensor):
+ """
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ Returns:
+ Loss tensor
+ """
+ hw = inputs.shape[1]
+
+ pos = F.binary_cross_entropy_with_logits(
+ inputs, torch.ones_like(inputs), reduction="none"
+ )
+ neg = F.binary_cross_entropy_with_logits(
+ inputs, torch.zeros_like(inputs), reduction="none"
+ )
+
+ loss = torch.einsum("nc,mc->nm", pos, targets) + torch.einsum(
+ "nc,mc->nm", neg, (1 - targets)
+ )
+
+ return loss / hw
+
+
+batch_sigmoid_ce_loss_jit = torch.jit.script(
+ batch_sigmoid_ce_loss
+) # type: torch.jit.ScriptModule
+
+
+class HungarianMatcher(nn.Module):
+ """This class computes an assignment between the targets and the predictions of the network
+
+ For efficiency reasons, the targets don't include the no_object. Because of this, in general,
+ there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
+ while the others are un-matched (and thus treated as non-objects).
+ """
+
+ def __init__(
+ self,
+ cost_class: float = 1,
+ cost_mask: float = 1,
+ cost_dice: float = 1,
+ num_points: int = 0,
+ ):
+ """Creates the matcher
+
+ Params:
+ cost_class: This is the relative weight of the classification error in the matching cost
+ cost_mask: This is the relative weight of the focal loss of the binary mask in the matching cost
+ cost_dice: This is the relative weight of the dice loss of the binary mask in the matching cost
+ """
+ super().__init__()
+ self.cost_class = cost_class
+ self.cost_mask = cost_mask
+ self.cost_dice = cost_dice
+
+ assert (
+ cost_class != 0 or cost_mask != 0 or cost_dice != 0
+ ), "all costs cant be 0"
+
+ self.num_points = num_points
+
+ @torch.no_grad()
+ def memory_efficient_forward(self, outputs, targets, mask_type):
+ """More memory-friendly matching"""
+ bs, num_queries = outputs["pred_logits"].shape[:2]
+
+ indices = []
+
+ # Iterate through batch size
+ for b in range(bs):
+
+ out_prob = outputs["pred_logits"][b].softmax(
+ -1
+ ) # [num_queries, num_classes]
+ tgt_ids = targets[b]["labels"].clone()
+
+ # Compute the classification cost. Contrary to the loss, we don't use the NLL,
+ # but approximate it in 1 - proba[target class].
+ # The 1 is a constant that doesn't change the matching, it can be ommitted.
+ filter_ignore = tgt_ids == 253
+ tgt_ids[filter_ignore] = 0
+ cost_class = -out_prob[:, tgt_ids]
+ cost_class[
+ :, filter_ignore
+ ] = (
+ -1.0
+ ) # for ignore classes pretend perfect match ;) TODO better worst class match?
+
+ out_mask = outputs["pred_masks"][
+ b
+ ].T # [num_queries, H_pred, W_pred]
+ # gt masks are already padded when preparing target
+ tgt_mask = targets[b][mask_type].to(out_mask)
+
+ if self.num_points != -1:
+ point_idx = torch.randperm(
+ tgt_mask.shape[1], device=tgt_mask.device
+ )[: int(self.num_points * tgt_mask.shape[1])]
+ # point_idx = torch.randint(0, tgt_mask.shape[1], size=(self.num_points,), device=tgt_mask.device)
+ else:
+ # sample all points
+ point_idx = torch.arange(
+ tgt_mask.shape[1], device=tgt_mask.device
+ )
+
+ # out_mask = out_mask[:, None]
+ # tgt_mask = tgt_mask[:, None]
+ # all masks share the same set of points for efficient matching!
+ # point_coords = torch.rand(1, self.num_points, 2, device=out_mask.device)
+ # get gt labels
+ # tgt_mask = point_sample(
+ # tgt_mask,
+ # point_coords.repeat(tgt_mask.shape[0], 1, 1),
+ # align_corners=False,
+ # ).squeeze(1)
+
+ # out_mask = point_sample(
+ # out_mask,
+ # point_coords.repeat(out_mask.shape[0], 1, 1),
+ # align_corners=False,
+ # ).squeeze(1)
+
+ with autocast(enabled=False):
+ out_mask = out_mask.float()
+ tgt_mask = tgt_mask.float()
+ # Compute the focal loss between masks
+ cost_mask = batch_sigmoid_ce_loss_jit(
+ out_mask[:, point_idx], tgt_mask[:, point_idx]
+ )
+
+ # Compute the dice loss betwen masks
+ cost_dice = batch_dice_loss_jit(
+ out_mask[:, point_idx], tgt_mask[:, point_idx]
+ )
+
+ # Final cost matrix
+ C = (
+ self.cost_mask * cost_mask
+ + self.cost_class * cost_class
+ + self.cost_dice * cost_dice
+ )
+ C = C.reshape(num_queries, -1).cpu()
+
+ indices.append(linear_sum_assignment(C))
+
+ return [
+ (
+ torch.as_tensor(i, dtype=torch.int64),
+ torch.as_tensor(j, dtype=torch.int64),
+ )
+ for i, j in indices
+ ]
+
+ @torch.no_grad()
+ def forward(self, outputs, targets, mask_type):
+ """Performs the matching
+
+ Params:
+ outputs: This is a dict that contains at least these entries:
+ "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
+ "pred_masks": Tensor of dim [batch_size, num_queries, H_pred, W_pred] with the predicted masks
+
+ targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
+ "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
+ objects in the target) containing the class labels
+ "masks": Tensor of dim [num_target_boxes, H_gt, W_gt] containing the target masks
+
+ Returns:
+ A list of size batch_size, containing tuples of (index_i, index_j) where:
+ - index_i is the indices of the selected predictions (in order)
+ - index_j is the indices of the corresponding selected targets (in order)
+ For each batch element, it holds:
+ len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
+ """
+ return self.memory_efficient_forward(outputs, targets, mask_type)
+
+ def __repr__(self, _repr_indent=4):
+ head = "Matcher " + self.__class__.__name__
+ body = [
+ "cost_class: {}".format(self.cost_class),
+ "cost_mask: {}".format(self.cost_mask),
+ "cost_dice: {}".format(self.cost_dice),
+ ]
+ lines = [head] + [" " * _repr_indent + line for line in body]
+ return "\n".join(lines)
diff --git a/models/Mask3D/build/lib/mask3d/models/metrics/__init__.py b/models/Mask3D/build/lib/mask3d/models/metrics/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd7538b5868b93e4192dbee9ca0da9e91323cf0f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/metrics/__init__.py
@@ -0,0 +1,4 @@
+from .confusionmatrix import ConfusionMatrix
+from .metrics import IoU
+
+__all__ = ["ConfusionMatrix", "IoU"]
diff --git a/models/Mask3D/build/lib/mask3d/models/metrics/confusionmatrix.py b/models/Mask3D/build/lib/mask3d/models/metrics/confusionmatrix.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d92f12595d26f76f3c26d18550b1b1486b837ff
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/metrics/confusionmatrix.py
@@ -0,0 +1,107 @@
+import numpy as np
+import torch
+
+
+class ConfusionMatrix:
+ """Constructs a confusion matrix for a multi-class classification problems.
+
+ Does not support multi-label, multi-class problems.
+
+ Keyword arguments:
+ - num_classes (int): number of classes in the classification problem.
+ - normalized (boolean, optional): Determines whether or not the confusion
+ matrix is normalized or not. Default: False.
+
+ Modified from: https://github.com/pytorch/tnt/blob/master/torchnet/meter/confusionmeter.py
+ """
+
+ def __init__(self, num_classes, ignore_label):
+ super().__init__()
+
+ self.conf = np.ndarray((num_classes, num_classes), dtype=np.int32)
+ self.ignore_label = ignore_label
+ self.num_classes = num_classes
+ self.reset()
+
+ def reset(self):
+ self.conf.fill(0)
+
+ def add(self, predicted, target):
+ """Computes the confusion matrix
+
+ The shape of the confusion matrix is K x K, where K is the number
+ of classes.
+
+ Keyword arguments:
+ - predicted (Tensor or numpy.ndarray): Can be an N x K tensor/array of
+ predicted scores obtained from the model for N examples and K classes,
+ or an N-tensor/array of integer values between 0 and K-1.
+ - target (Tensor or numpy.ndarray): Can be an N x K tensor/array of
+ ground-truth classes for N examples and K classes, or an N-tensor/array
+ of integer values between 0 and K-1.
+
+ """
+ # _, predicted = predicted.max(1)
+
+ # predicted = predicted.view(-1)
+ # target = target.view(-1)
+
+ # If target and/or predicted are tensors, convert them to numpy arrays
+ if torch.is_tensor(predicted):
+ predicted = predicted.cpu().numpy()
+ if torch.is_tensor(target):
+ target = target.cpu().numpy()
+ ind = ~np.isin(target, self.ignore_label)
+ predicted, target = predicted[ind], target[ind]
+
+ assert (
+ predicted.shape[0] == target.shape[0]
+ ), "number of targets and predicted outputs do not match"
+
+ if np.ndim(predicted) != 1:
+ assert (
+ predicted.shape[1] == self.num_classes
+ ), "number of predictions does not match size of confusion matrix"
+ predicted = np.argmax(predicted, 1)
+ else:
+ assert (predicted.max() < self.num_classes) and (
+ predicted.min() >= 0
+ ), "predicted values are not between 0 and k-1"
+
+ if np.ndim(target) != 1:
+ assert (
+ target.shape[1] == self.num_classes
+ ), "Onehot target does not match size of confusion matrix"
+ assert (target >= 0).all() and (
+ target <= 1
+ ).all(), "in one-hot encoding, target values should be 0 or 1"
+ assert (
+ target.sum(1) == 1
+ ).all(), "multi-label setting is not supported"
+ target = np.argmax(target, 1)
+ else:
+ assert (target.max() < self.num_classes) and (
+ target.min() >= 0
+ ), "target values are not between 0 and k-1"
+
+ # hack for bincounting 2 arrays together
+ x = predicted + self.num_classes * target
+ bincount_2d = np.bincount(
+ x.astype(np.int32), minlength=self.num_classes**2
+ )
+ assert bincount_2d.size == self.num_classes**2
+ conf = bincount_2d.reshape((self.num_classes, self.num_classes))
+
+ self.conf += conf
+
+ def value(self, normalized=False):
+ """
+ Returns:
+ Confustion matrix of K rows and K columns, where rows corresponds
+ to ground-truth targets and columns corresponds to predicted
+ targets.
+ """
+ if normalized:
+ conf = self.conf.astype(np.float32)
+ return conf / conf.sum(1).clip(min=1e-12)[:, None]
+ return self.conf
diff --git a/models/Mask3D/build/lib/mask3d/models/metrics/metrics.py b/models/Mask3D/build/lib/mask3d/models/metrics/metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..f3f4b0ca4f7b0c5224ea242f459374a28485539f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/metrics/metrics.py
@@ -0,0 +1,48 @@
+import numpy as np
+
+
+class IoU:
+ """Computes the intersection over union (IoU) per class and corresponding
+ mean (mIoU).
+
+ Intersection over union (IoU) is a common evaluation metric for semantic
+ segmentation. The predictions are first accumulated in a confusion matrix
+ and the IoU is computed from it as follows:
+
+ IoU = true_positive / (true_positive + false_positive + false_negative).
+
+ Keyword arguments:
+ - num_classes (int): number of classes in the classification problem
+ - normalized (boolean, optional): Determines whether or not the confusion
+ matrix is normalized or not. Default: False.
+ - ignore_index (int or iterable, optional): Index of the classes to ignore
+ when computing the IoU. Can be an int, or any iterable of ints.
+
+ Modified from: https://github.com/pytorch/tnt/blob/master/torchnet/meter
+
+ """
+
+ def __init__(self):
+ super().__init__()
+
+ def value(self, conf_matrix):
+ """Computes the IoU and mean IoU.
+
+ The mean computation ignores NaN elements of the IoU array.
+
+ Returns:
+ Tuple: (IoU, mIoU). The first output is the per class IoU,
+ for K classes it's numpy.ndarray with K elements. The second output,
+ is the mean IoU.
+ """
+ true_positive = np.diag(conf_matrix)
+ false_positive = np.sum(conf_matrix, 0) - true_positive
+ false_negative = np.sum(conf_matrix, 1) - true_positive
+
+ # Just in case we get a division by 0, ignore/hide the error
+ with np.errstate(divide="ignore", invalid="ignore"):
+ iou = true_positive / (
+ true_positive + false_positive + false_negative
+ )
+
+ return iou
diff --git a/models/Mask3D/build/lib/mask3d/models/misc.py b/models/Mask3D/build/lib/mask3d/models/misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..8416b62804fbc002bd02a457d896276bc307b070
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/misc.py
@@ -0,0 +1,119 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/util/misc.py
+"""
+Misc functions, including distributed helpers.
+
+Mostly copy-paste from torchvision references.
+"""
+from typing import List, Optional
+
+import torch
+import torch.distributed as dist
+import torchvision
+from torch import Tensor
+
+
+def _max_by_axis(the_list):
+ # type: (List[List[int]]) -> List[int]
+ maxes = the_list[0]
+ for sublist in the_list[1:]:
+ for index, item in enumerate(sublist):
+ maxes[index] = max(maxes[index], item)
+ return maxes
+
+
+class NestedTensor(object):
+ def __init__(self, tensors, mask: Optional[Tensor]):
+ self.tensors = tensors
+ self.mask = mask
+
+ def to(self, device):
+ # type: (Device) -> NestedTensor # noqa
+ cast_tensor = self.tensors.to(device)
+ mask = self.mask
+ if mask is not None:
+ assert mask is not None
+ cast_mask = mask.to(device)
+ else:
+ cast_mask = None
+ return NestedTensor(cast_tensor, cast_mask)
+
+ def decompose(self):
+ return self.tensors, self.mask
+
+ def __repr__(self):
+ return str(self.tensors)
+
+
+def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
+ # TODO make this more general
+ if tensor_list[0].ndim == 3:
+ if torchvision._is_tracing():
+ # nested_tensor_from_tensor_list() does not export well to ONNX
+ # call _onnx_nested_tensor_from_tensor_list() instead
+ return _onnx_nested_tensor_from_tensor_list(tensor_list)
+
+ # TODO make it support different-sized images
+ max_size = _max_by_axis([list(img.shape) for img in tensor_list])
+ # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))
+ batch_shape = [len(tensor_list)] + max_size
+ b, c, h, w = batch_shape
+ dtype = tensor_list[0].dtype
+ device = tensor_list[0].device
+ tensor = torch.zeros(batch_shape, dtype=dtype, device=device)
+ mask = torch.ones((b, h, w), dtype=torch.bool, device=device)
+ for img, pad_img, m in zip(tensor_list, tensor, mask):
+ pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ m[: img.shape[1], : img.shape[2]] = False
+ else:
+ raise ValueError("not supported")
+ return NestedTensor(tensor, mask)
+
+
+# _onnx_nested_tensor_from_tensor_list() is an implementation of
+# nested_tensor_from_tensor_list() that is supported by ONNX tracing.
+@torch.jit.unused
+def _onnx_nested_tensor_from_tensor_list(
+ tensor_list: List[Tensor],
+) -> NestedTensor:
+ max_size = []
+ for i in range(tensor_list[0].dim()):
+ max_size_i = torch.max(
+ torch.stack([img.shape[i] for img in tensor_list]).to(
+ torch.float32
+ )
+ ).to(torch.int64)
+ max_size.append(max_size_i)
+ max_size = tuple(max_size)
+
+ # work around for
+ # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ # m[: img.shape[1], :img.shape[2]] = False
+ # which is not yet supported in onnx
+ padded_imgs = []
+ padded_masks = []
+ for img in tensor_list:
+ padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]
+ padded_img = torch.nn.functional.pad(
+ img, (0, padding[2], 0, padding[1], 0, padding[0])
+ )
+ padded_imgs.append(padded_img)
+
+ m = torch.zeros_like(img[0], dtype=torch.int, device=img.device)
+ padded_mask = torch.nn.functional.pad(
+ m, (0, padding[2], 0, padding[1]), "constant", 1
+ )
+ padded_masks.append(padded_mask.to(torch.bool))
+
+ tensor = torch.stack(padded_imgs)
+ mask = torch.stack(padded_masks)
+
+ return NestedTensor(tensor, mask=mask)
+
+
+def is_dist_avail_and_initialized():
+ if not dist.is_available():
+ return False
+ if not dist.is_initialized():
+ return False
+ return True
diff --git a/models/Mask3D/build/lib/mask3d/models/model.py b/models/Mask3D/build/lib/mask3d/models/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..d167fa58358f2c1a7ca4a509e38c61906e9dd7ac
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/model.py
@@ -0,0 +1,27 @@
+from MinkowskiEngine import MinkowskiNetwork
+
+
+class Model(MinkowskiNetwork):
+ """
+ Base network for all sparse convnet
+
+ By default, all networks are segmentation networks.
+ """
+
+ OUT_PIXEL_DIST = -1
+
+ def __init__(self, in_channels, out_channels, config, D, **kwargs):
+ super().__init__(D)
+ self.in_channels = in_channels
+ self.out_channels = out_channels
+ self.config = config
+
+
+class HighDimensionalModel(Model):
+ """
+ Base network for all spatio (temporal) chromatic sparse convnet
+ """
+
+ def __init__(self, in_channels, out_channels, config, D, **kwargs):
+ assert D > 4, "Num dimension smaller than 5"
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/3detr_helpers.py b/models/Mask3D/build/lib/mask3d/models/modules/3detr_helpers.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c3f7ea57c0266a9781cdfec9f59896d15750a9d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/modules/3detr_helpers.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import torch.nn as nn
+from functools import partial
+import copy
+
+
+class BatchNormDim1Swap(nn.BatchNorm1d):
+ """
+ Used for nn.Transformer that uses a HW x N x C rep
+ """
+
+ def forward(self, x):
+ """
+ x: HW x N x C
+ permute to N x C x HW
+ Apply BN on C
+ permute back
+ """
+ hw, n, c = x.shape
+ x = x.permute(1, 2, 0)
+ x = super(BatchNormDim1Swap, self).forward(x)
+ # x: n x c x hw -> hw x n x c
+ x = x.permute(2, 0, 1)
+ return x
+
+
+NORM_DICT = {
+ "bn": BatchNormDim1Swap,
+ "bn1d": nn.BatchNorm1d,
+ "id": nn.Identity,
+ "ln": nn.LayerNorm,
+}
+
+ACTIVATION_DICT = {
+ "relu": nn.ReLU,
+ "gelu": nn.GELU,
+ "leakyrelu": partial(nn.LeakyReLU, negative_slope=0.1),
+}
+
+WEIGHT_INIT_DICT = {
+ "xavier_uniform": nn.init.xavier_uniform_,
+}
+
+
+class GenericMLP(nn.Module):
+ def __init__(
+ self,
+ input_dim,
+ hidden_dims,
+ output_dim,
+ norm_fn_name=None,
+ activation="relu",
+ use_conv=False,
+ dropout=None,
+ hidden_use_bias=False,
+ output_use_bias=True,
+ output_use_activation=False,
+ output_use_norm=False,
+ weight_init_name=None,
+ ):
+ super().__init__()
+ activation = ACTIVATION_DICT[activation]
+ norm = None
+ if norm_fn_name is not None:
+ norm = NORM_DICT[norm_fn_name]
+ if norm_fn_name == "ln" and use_conv:
+ norm = lambda x: nn.GroupNorm(1, x) # easier way to use LayerNorm
+
+ if dropout is not None:
+ if not isinstance(dropout, list):
+ dropout = [dropout for _ in range(len(hidden_dims))]
+
+ layers = []
+ prev_dim = input_dim
+ for idx, x in enumerate(hidden_dims):
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, x, 1, bias=hidden_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, x, bias=hidden_use_bias)
+ layers.append(layer)
+ if norm:
+ layers.append(norm(x))
+ layers.append(activation())
+ if dropout is not None:
+ layers.append(nn.Dropout(p=dropout[idx]))
+ prev_dim = x
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, output_dim, 1, bias=output_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, output_dim, bias=output_use_bias)
+ layers.append(layer)
+
+ if output_use_norm:
+ layers.append(norm(output_dim))
+
+ if output_use_activation:
+ layers.append(activation())
+
+ self.layers = nn.Sequential(*layers)
+
+ if weight_init_name is not None:
+ self.do_weight_init(weight_init_name)
+
+ def do_weight_init(self, weight_init_name):
+ func = WEIGHT_INIT_DICT[weight_init_name]
+ for (_, param) in self.named_parameters():
+ if param.dim() > 1: # skips batchnorm/layernorm
+ func(param)
+
+ def forward(self, x):
+ output = self.layers(x)
+ return output
+
+
+def get_clones(module, N):
+ return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/__init__.py b/models/Mask3D/build/lib/mask3d/models/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/common.py b/models/Mask3D/build/lib/mask3d/models/modules/common.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae78b5b301cfd6ffcfc3417b543ebe2289602fb7
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/modules/common.py
@@ -0,0 +1,275 @@
+import sys
+
+if sys.version_info[:2] >= (3, 8):
+ from collections.abc import Sequence
+else:
+ from collections import Sequence
+
+from enum import Enum
+
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+
+class NormType(Enum):
+ BATCH_NORM = 0
+ INSTANCE_NORM = 1
+ INSTANCE_BATCH_NORM = 2
+
+
+def get_norm(norm_type, n_channels, D, bn_momentum=0.1):
+ if norm_type == NormType.BATCH_NORM:
+ return ME.MinkowskiBatchNorm(n_channels, momentum=bn_momentum)
+ elif norm_type == NormType.INSTANCE_NORM:
+ return ME.MinkowskiInstanceNorm(n_channels)
+ elif norm_type == NormType.INSTANCE_BATCH_NORM:
+ return nn.Sequential(
+ ME.MinkowskiInstanceNorm(n_channels),
+ ME.MinkowskiBatchNorm(n_channels, momentum=bn_momentum),
+ )
+ else:
+ raise ValueError(f"Norm type: {norm_type} not supported")
+
+
+class ConvType(Enum):
+ """
+ Define the kernel region type
+ """
+
+ HYPERCUBE = 0, "HYPERCUBE"
+ SPATIAL_HYPERCUBE = 1, "SPATIAL_HYPERCUBE"
+ SPATIO_TEMPORAL_HYPERCUBE = 2, "SPATIO_TEMPORAL_HYPERCUBE"
+ HYPERCROSS = 3, "HYPERCROSS"
+ SPATIAL_HYPERCROSS = 4, "SPATIAL_HYPERCROSS"
+ SPATIO_TEMPORAL_HYPERCROSS = 5, "SPATIO_TEMPORAL_HYPERCROSS"
+ SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS = (
+ 6,
+ "SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS ",
+ )
+
+ def __new__(cls, value, name):
+ member = object.__new__(cls)
+ member._value_ = value
+ member.fullname = name
+ return member
+
+ def __int__(self):
+ return self.value
+
+
+# Convert the ConvType var to a RegionType var
+conv_to_region_type = {
+ # kernel_size = [k, k, k, 1]
+ ConvType.HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.SPATIAL_HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.SPATIO_TEMPORAL_HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIAL_HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIO_TEMPORAL_HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS: ME.RegionType.HYPER_CUBE, # JONAS CHANGE from HYBRID
+}
+
+# int_to_region_type = {m.value: m for m in ME.RegionType}
+int_to_region_type = {m: ME.RegionType(m) for m in range(3)}
+
+
+def convert_region_type(region_type):
+ """
+ Convert the integer region_type to the corresponding RegionType enum object.
+ """
+ return int_to_region_type[region_type]
+
+
+def convert_conv_type(conv_type, kernel_size, D):
+ assert isinstance(conv_type, ConvType), "conv_type must be of ConvType"
+ region_type = conv_to_region_type[conv_type]
+ axis_types = None
+ if conv_type == ConvType.SPATIAL_HYPERCUBE:
+ # No temporal convolution
+ if isinstance(kernel_size, Sequence):
+ kernel_size = kernel_size[:3]
+ else:
+ kernel_size = [
+ kernel_size,
+ ] * 3
+ if D == 4:
+ kernel_size.append(1)
+ elif conv_type == ConvType.SPATIO_TEMPORAL_HYPERCUBE:
+ # conv_type conversion already handled
+ assert D == 4
+ elif conv_type == ConvType.HYPERCUBE:
+ # conv_type conversion already handled
+ pass
+ elif conv_type == ConvType.SPATIAL_HYPERCROSS:
+ if isinstance(kernel_size, Sequence):
+ kernel_size = kernel_size[:3]
+ else:
+ kernel_size = [
+ kernel_size,
+ ] * 3
+ if D == 4:
+ kernel_size.append(1)
+ elif conv_type == ConvType.HYPERCROSS:
+ # conv_type conversion already handled
+ pass
+ elif conv_type == ConvType.SPATIO_TEMPORAL_HYPERCROSS:
+ # conv_type conversion already handled
+ assert D == 4
+ elif conv_type == ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS:
+ # Define the CUBIC conv kernel for spatial dims and CROSS conv for temp dim
+ axis_types = [
+ ME.RegionType.HYPER_CUBE,
+ ] * 3
+ if D == 4:
+ axis_types.append(ME.RegionType.HYPER_CROSS)
+ return region_type, axis_types, kernel_size
+
+
+def conv(
+ in_planes,
+ out_planes,
+ kernel_size,
+ stride=1,
+ dilation=1,
+ bias=False,
+ conv_type=ConvType.HYPERCUBE,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=None, # axis_types JONAS
+ dimension=D,
+ )
+
+ return ME.MinkowskiConvolution(
+ in_channels=in_planes,
+ out_channels=out_planes,
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ bias=bias,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def conv_tr(
+ in_planes,
+ out_planes,
+ kernel_size,
+ upsample_stride=1,
+ dilation=1,
+ bias=False,
+ conv_type=ConvType.HYPERCUBE,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ upsample_stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiConvolutionTranspose(
+ in_channels=in_planes,
+ out_channels=out_planes,
+ kernel_size=kernel_size,
+ stride=upsample_stride,
+ dilation=dilation,
+ bias=bias,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def avg_pool(
+ kernel_size,
+ stride=1,
+ dilation=1,
+ conv_type=ConvType.HYPERCUBE,
+ in_coords_key=None,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiAvgPooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def avg_unpool(
+ kernel_size, stride=1, dilation=1, conv_type=ConvType.HYPERCUBE, D=-1
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiAvgUnpooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def sum_pool(
+ kernel_size, stride=1, dilation=1, conv_type=ConvType.HYPERCUBE, D=-1
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiSumPooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/helpers_3detr.py b/models/Mask3D/build/lib/mask3d/models/modules/helpers_3detr.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c3f7ea57c0266a9781cdfec9f59896d15750a9d
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/modules/helpers_3detr.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import torch.nn as nn
+from functools import partial
+import copy
+
+
+class BatchNormDim1Swap(nn.BatchNorm1d):
+ """
+ Used for nn.Transformer that uses a HW x N x C rep
+ """
+
+ def forward(self, x):
+ """
+ x: HW x N x C
+ permute to N x C x HW
+ Apply BN on C
+ permute back
+ """
+ hw, n, c = x.shape
+ x = x.permute(1, 2, 0)
+ x = super(BatchNormDim1Swap, self).forward(x)
+ # x: n x c x hw -> hw x n x c
+ x = x.permute(2, 0, 1)
+ return x
+
+
+NORM_DICT = {
+ "bn": BatchNormDim1Swap,
+ "bn1d": nn.BatchNorm1d,
+ "id": nn.Identity,
+ "ln": nn.LayerNorm,
+}
+
+ACTIVATION_DICT = {
+ "relu": nn.ReLU,
+ "gelu": nn.GELU,
+ "leakyrelu": partial(nn.LeakyReLU, negative_slope=0.1),
+}
+
+WEIGHT_INIT_DICT = {
+ "xavier_uniform": nn.init.xavier_uniform_,
+}
+
+
+class GenericMLP(nn.Module):
+ def __init__(
+ self,
+ input_dim,
+ hidden_dims,
+ output_dim,
+ norm_fn_name=None,
+ activation="relu",
+ use_conv=False,
+ dropout=None,
+ hidden_use_bias=False,
+ output_use_bias=True,
+ output_use_activation=False,
+ output_use_norm=False,
+ weight_init_name=None,
+ ):
+ super().__init__()
+ activation = ACTIVATION_DICT[activation]
+ norm = None
+ if norm_fn_name is not None:
+ norm = NORM_DICT[norm_fn_name]
+ if norm_fn_name == "ln" and use_conv:
+ norm = lambda x: nn.GroupNorm(1, x) # easier way to use LayerNorm
+
+ if dropout is not None:
+ if not isinstance(dropout, list):
+ dropout = [dropout for _ in range(len(hidden_dims))]
+
+ layers = []
+ prev_dim = input_dim
+ for idx, x in enumerate(hidden_dims):
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, x, 1, bias=hidden_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, x, bias=hidden_use_bias)
+ layers.append(layer)
+ if norm:
+ layers.append(norm(x))
+ layers.append(activation())
+ if dropout is not None:
+ layers.append(nn.Dropout(p=dropout[idx]))
+ prev_dim = x
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, output_dim, 1, bias=output_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, output_dim, bias=output_use_bias)
+ layers.append(layer)
+
+ if output_use_norm:
+ layers.append(norm(output_dim))
+
+ if output_use_activation:
+ layers.append(activation())
+
+ self.layers = nn.Sequential(*layers)
+
+ if weight_init_name is not None:
+ self.do_weight_init(weight_init_name)
+
+ def do_weight_init(self, weight_init_name):
+ func = WEIGHT_INIT_DICT[weight_init_name]
+ for (_, param) in self.named_parameters():
+ if param.dim() > 1: # skips batchnorm/layernorm
+ func(param)
+
+ def forward(self, x):
+ output = self.layers(x)
+ return output
+
+
+def get_clones(module, N):
+ return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/resnet_block.py b/models/Mask3D/build/lib/mask3d/models/modules/resnet_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac16b72aa198964e343f57ad4f79193a22e830dc
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/modules/resnet_block.py
@@ -0,0 +1,157 @@
+import torch.nn as nn
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.modules.common import ConvType, NormType, conv, get_norm
+
+
+class BasicBlockBase(nn.Module):
+ expansion = 1
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+
+ self.conv1 = conv(
+ inplanes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm1 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=1,
+ dilation=dilation,
+ bias=False,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class BasicBlock(BasicBlockBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BasicBlockIN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BasicBlockINBN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+
+
+class BottleneckBase(nn.Module):
+ expansion = 4
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+ self.conv1 = conv(inplanes, planes, kernel_size=1, D=D)
+ self.norm1 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv3 = conv(planes, planes * self.expansion, kernel_size=1, D=D)
+ self.norm3 = get_norm(
+ self.NORM_TYPE, planes * self.expansion, D, bn_momentum=bn_momentum
+ )
+
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.relu(out)
+
+ out = self.conv3(out)
+ out = self.norm3(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class Bottleneck(BottleneckBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BottleneckIN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BottleneckINBN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
diff --git a/models/Mask3D/build/lib/mask3d/models/modules/senet_block.py b/models/Mask3D/build/lib/mask3d/models/modules/senet_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..130082738505c79d5ecddb010595a5a66b9d8509
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/modules/senet_block.py
@@ -0,0 +1,138 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+from mix3d.models.modules.common import ConvType, NormType
+from mix3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class SELayer(nn.Module):
+ def __init__(self, channel, reduction=16, D=-1):
+ # Global coords does not require coords_key
+ super().__init__()
+ self.fc = nn.Sequential(
+ ME.MinkowskiLinear(channel, channel // reduction),
+ ME.MinkowskiReLU(inplace=True),
+ ME.MinkowskiLinear(channel // reduction, channel),
+ ME.MinkowskiSigmoid(),
+ )
+ self.pooling = ME.MinkowskiGlobalPooling(dimension=D)
+ self.broadcast_mul = ME.MinkowskiBroadcastMultiplication(dimension=D)
+
+ def forward(self, x):
+ y = self.pooling(x)
+ y = self.fc(y)
+ return self.broadcast_mul(x, y)
+
+
+class SEBasicBlock(BasicBlock):
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ reduction=16,
+ D=-1,
+ ):
+ super().__init__(
+ inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.se = SELayer(planes, reduction=reduction, D=D)
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.se(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class SEBasicBlockSN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_SWITCH_NORM
+
+
+class SEBasicBlockIN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_INSTANCE_NORM
+
+
+class SEBasicBlockLN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_LAYER_NORM
+
+
+class SEBottleneck(Bottleneck):
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ D=3,
+ reduction=16,
+ ):
+ super().__init__(
+ inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.se = SELayer(planes * self.expansion, reduction=reduction, D=D)
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.relu(out)
+
+ out = self.conv3(out)
+ out = self.norm3(out)
+ out = self.se(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class SEBottleneckSN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_SWITCH_NORM
+
+
+class SEBottleneckIN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_INSTANCE_NORM
+
+
+class SEBottleneckLN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_LAYER_NORM
diff --git a/models/Mask3D/build/lib/mask3d/models/position_embedding.py b/models/Mask3D/build/lib/mask3d/models/position_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..70275f1610e1d3f5ec8d11d18d298b7877204b86
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/position_embedding.py
@@ -0,0 +1,179 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Various positional encodings for the transformer.
+"""
+import math
+import torch
+from torch import nn
+import numpy as np
+
+# from utils.pc_util import shift_scale_points
+
+
+def shift_scale_points(pred_xyz, src_range, dst_range=None):
+ """
+ pred_xyz: B x N x 3
+ src_range: [[B x 3], [B x 3]] - min and max XYZ coords
+ dst_range: [[B x 3], [B x 3]] - min and max XYZ coords
+ """
+ if dst_range is None:
+ dst_range = [
+ torch.zeros(
+ (src_range[0].shape[0], 3), device=src_range[0].device
+ ),
+ torch.ones((src_range[0].shape[0], 3), device=src_range[0].device),
+ ]
+
+ if pred_xyz.ndim == 4:
+ src_range = [x[:, None] for x in src_range]
+ dst_range = [x[:, None] for x in dst_range]
+
+ assert src_range[0].shape[0] == pred_xyz.shape[0]
+ assert dst_range[0].shape[0] == pred_xyz.shape[0]
+ assert src_range[0].shape[-1] == pred_xyz.shape[-1]
+ assert src_range[0].shape == src_range[1].shape
+ assert dst_range[0].shape == dst_range[1].shape
+ assert src_range[0].shape == dst_range[1].shape
+
+ src_diff = src_range[1][:, None, :] - src_range[0][:, None, :]
+ dst_diff = dst_range[1][:, None, :] - dst_range[0][:, None, :]
+ prop_xyz = (
+ ((pred_xyz - src_range[0][:, None, :]) * dst_diff) / src_diff
+ ) + dst_range[0][:, None, :]
+ return prop_xyz
+
+
+class PositionEmbeddingCoordsSine(nn.Module):
+ def __init__(
+ self,
+ temperature=10000,
+ normalize=False,
+ scale=None,
+ pos_type="fourier",
+ d_pos=None,
+ d_in=3,
+ gauss_scale=1.0,
+ ):
+ super().__init__()
+ self.d_pos = d_pos
+ self.temperature = temperature
+ self.normalize = normalize
+ if scale is not None and normalize is False:
+ raise ValueError("normalize should be True if scale is passed")
+ if scale is None:
+ scale = 2 * math.pi
+ assert pos_type in ["sine", "fourier"]
+ self.pos_type = pos_type
+ self.scale = scale
+ if pos_type == "fourier":
+ assert d_pos is not None
+ assert d_pos % 2 == 0
+ # define a gaussian matrix input_ch -> output_ch
+ B = torch.empty((d_in, d_pos // 2)).normal_()
+ B *= gauss_scale
+ self.register_buffer("gauss_B", B)
+ self.d_pos = d_pos
+
+ def get_sine_embeddings(self, xyz, num_channels, input_range):
+ num_channels = self.d_pos
+ # clone coords so that shift/scale operations do not affect original tensor
+ orig_xyz = xyz
+ xyz = orig_xyz.clone()
+
+ ncoords = xyz.shape[1]
+ if self.normalize:
+ xyz = shift_scale_points(xyz, src_range=input_range)
+
+ ndim = num_channels // xyz.shape[2]
+ if ndim % 2 != 0:
+ ndim -= 1
+ # automatically handle remainder by assiging it to the first dim
+ rems = num_channels - (ndim * xyz.shape[2])
+
+ assert (
+ ndim % 2 == 0
+ ), f"Cannot handle odd sized ndim={ndim} where num_channels={num_channels} and xyz={xyz.shape}"
+
+ final_embeds = []
+ prev_dim = 0
+
+ for d in range(xyz.shape[2]):
+ cdim = ndim
+ if rems > 0:
+ # add remainder in increments of two to maintain even size
+ cdim += 2
+ rems -= 2
+
+ if cdim != prev_dim:
+ dim_t = torch.arange(
+ cdim, dtype=torch.float32, device=xyz.device
+ )
+ dim_t = self.temperature ** (2 * (dim_t // 2) / cdim)
+
+ # create batch x cdim x nccords embedding
+ raw_pos = xyz[:, :, d]
+ if self.scale:
+ raw_pos *= self.scale
+ pos = raw_pos[:, :, None] / dim_t
+ pos = torch.stack(
+ (pos[:, :, 0::2].sin(), pos[:, :, 1::2].cos()), dim=3
+ ).flatten(2)
+ final_embeds.append(pos)
+ prev_dim = cdim
+
+ final_embeds = torch.cat(final_embeds, dim=2).permute(0, 2, 1)
+ return final_embeds
+
+ def get_fourier_embeddings(self, xyz, num_channels=None, input_range=None):
+ # Follows - https://people.eecs.berkeley.edu/~bmild/fourfeat/index.html
+
+ if num_channels is None:
+ num_channels = self.gauss_B.shape[1] * 2
+
+ bsize, npoints = xyz.shape[0], xyz.shape[1]
+ assert num_channels > 0 and num_channels % 2 == 0
+ d_in, max_d_out = self.gauss_B.shape[0], self.gauss_B.shape[1]
+ d_out = num_channels // 2
+ assert d_out <= max_d_out
+ assert d_in == xyz.shape[-1]
+
+ # clone coords so that shift/scale operations do not affect original tensor
+ orig_xyz = xyz
+ xyz = orig_xyz.clone()
+
+ ncoords = xyz.shape[1]
+ if self.normalize:
+ xyz = shift_scale_points(xyz, src_range=input_range)
+
+ xyz *= 2 * np.pi
+ xyz_proj = torch.mm(xyz.view(-1, d_in), self.gauss_B[:, :d_out]).view(
+ bsize, npoints, d_out
+ )
+ final_embeds = [xyz_proj.sin(), xyz_proj.cos()]
+
+ # return batch x d_pos x npoints embedding
+ final_embeds = torch.cat(final_embeds, dim=2).permute(0, 2, 1)
+ return final_embeds
+
+ def forward(self, xyz, num_channels=None, input_range=None):
+ assert isinstance(xyz, torch.Tensor)
+ assert xyz.ndim == 3
+ # xyz is batch x npoints x 3
+ if self.pos_type == "sine":
+ with torch.no_grad():
+ out = self.get_sine_embeddings(xyz, num_channels, input_range)
+ elif self.pos_type == "fourier":
+ with torch.no_grad():
+ out = self.get_fourier_embeddings(
+ xyz, num_channels, input_range
+ )
+ else:
+ raise ValueError(f"Unknown {self.pos_type}")
+
+ return out
+
+ def extra_repr(self):
+ st = f"type={self.pos_type}, scale={self.scale}, normalize={self.normalize}"
+ if hasattr(self, "gauss_B"):
+ st += f", gaussB={self.gauss_B.shape}, gaussBsum={self.gauss_B.sum().item()}"
+ return st
diff --git a/models/Mask3D/build/lib/mask3d/models/res16unet.py b/models/Mask3D/build/lib/mask3d/models/res16unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..db771a6f12341b70d9e27e8f61efc2878b5d12c3
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/res16unet.py
@@ -0,0 +1,444 @@
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.resnet import ResNetBase, get_norm
+from mask3d.models.modules.common import ConvType, NormType, conv, conv_tr
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class Res16UNetBase(ResNetBase):
+ BLOCK = None
+ PLANES = (32, 64, 128, 256, 256, 256, 256, 256)
+ DILATIONS = (1, 1, 1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2, 2, 2)
+ INIT_DIM = 32
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(
+ self, in_channels, out_channels, config, D=3, out_fpn=False, **kwargs
+ ):
+ super().__init__(in_channels, out_channels, config, D)
+ self.out_fpn = out_fpn
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv0p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn0 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv1p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p8s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr4p16s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr5p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr6p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[6] + self.PLANES[0] * self.BLOCK.expansion
+ self.block7 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[6],
+ self.LAYERS[6],
+ dilation=dilations[6],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr7p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[7],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr7 = get_norm(
+ self.NORM_TYPE, self.PLANES[7], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[7] + self.INIT_DIM
+ self.block8 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[7],
+ self.LAYERS[7],
+ dilation=dilations[7],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.final = conv(
+ self.PLANES[7],
+ out_channels,
+ kernel_size=1,
+ stride=1,
+ bias=True,
+ D=D,
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+
+ def forward(self, x):
+ feature_maps = []
+
+ out = self.conv0p1s1(x)
+ out = self.bn0(out)
+ out_p1 = self.relu(out)
+
+ out = self.conv1p1s2(out_p1)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out_b1p2 = self.block1(out)
+
+ out = self.conv2p2s2(out_b1p2)
+ out = self.bn2(out)
+ out = self.relu(out)
+ out_b2p4 = self.block2(out)
+
+ out = self.conv3p4s2(out_b2p4)
+ out = self.bn3(out)
+ out = self.relu(out)
+ out_b3p8 = self.block3(out)
+
+ # pixel_dist=16
+ out = self.conv4p8s2(out_b3p8)
+ out = self.bn4(out)
+ out = self.relu(out)
+ out = self.block4(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=8
+ out = self.convtr4p16s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p8)
+ out = self.block5(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=4
+ out = self.convtr5p8s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p4)
+ out = self.block6(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=2
+ out = self.convtr6p4s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p2)
+ out = self.block7(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=1
+ out = self.convtr7p2s2(out)
+ out = self.bntr7(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_p1)
+ out = self.block8(out)
+
+ feature_maps.append(out)
+
+ if not self.out_fpn:
+ return out
+ else:
+ return out, feature_maps
+
+
+class Res16UNet14(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1, 1, 1, 1, 1)
+
+
+class Res16UNet18(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2, 2, 2, 2, 2)
+
+
+class Res16UNet34(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 3, 4, 6, 2, 2, 2, 2)
+
+
+class Res16UNet50(Res16UNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (2, 3, 4, 6, 2, 2, 2, 2)
+
+
+class Res16UNet101(Res16UNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (2, 3, 4, 23, 2, 2, 2, 2)
+
+
+class Res16UNet14A(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class Res16UNet14A2(Res16UNet14A):
+ LAYERS = (1, 1, 1, 1, 2, 2, 2, 2)
+
+
+class Res16UNet14B(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 128, 128, 128, 128)
+
+
+class Res16UNet14B2(Res16UNet14B):
+ LAYERS = (1, 1, 1, 1, 2, 2, 2, 2)
+
+
+class Res16UNet14B3(Res16UNet14B):
+ LAYERS = (2, 2, 2, 2, 1, 1, 1, 1)
+
+
+class Res16UNet14C(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 192, 192, 128, 128)
+
+
+class Res16UNet14D(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 384, 384, 384, 384)
+
+
+class Res16UNet18A(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class Res16UNet18B(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 128, 128)
+
+
+class Res16UNet18D(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 384, 384, 384, 384)
+
+
+class Res16UNet34A(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 64, 64)
+
+
+class Res16UNet34B(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 64, 32)
+
+
+class Res16UNet34C(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 96, 96)
+
+
+class Custom30M(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 128, 64, 64, 32)
+
+
+class Res16UNet34D(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 96, 128)
+
+
+class STRes16UNetBase(Res16UNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STRes16UNet14(STRes16UNetBase, Res16UNet14):
+ pass
+
+
+class STRes16UNet14A(STRes16UNetBase, Res16UNet14A):
+ pass
+
+
+class STRes16UNet18(STRes16UNetBase, Res16UNet18):
+ pass
+
+
+class STRes16UNet34(STRes16UNetBase, Res16UNet34):
+ pass
+
+
+class STRes16UNet50(STRes16UNetBase, Res16UNet50):
+ pass
+
+
+class STRes16UNet101(STRes16UNetBase, Res16UNet101):
+ pass
+
+
+class STRes16UNet18A(STRes16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class STResTesseract16UNetBase(STRes16UNetBase):
+ pass
+ # CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseract16UNet18A(STRes16UNet18A, STResTesseract16UNetBase):
+ pass
diff --git a/models/Mask3D/build/lib/mask3d/models/resnet.py b/models/Mask3D/build/lib/mask3d/models/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6ad622893d191fce0cf9db6edafbc83f684d218
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/resnet.py
@@ -0,0 +1,243 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+from mask3d.models.model import Model
+from mask3d.models.modules.common import ConvType, NormType, conv, get_norm, sum_pool
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class ResNetBase(Model):
+ BLOCK = None
+ LAYERS = ()
+ INIT_DIM = 64
+ PLANES = (64, 128, 256, 512)
+ OUT_PIXEL_DIST = 32
+ HAS_LAST_BLOCK = False
+ CONV_TYPE = ConvType.HYPERCUBE
+
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ assert self.BLOCK is not None
+ assert self.OUT_PIXEL_DIST > 0
+
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+ self.network_initialization(in_channels, out_channels, config, D)
+ self.weight_initialization()
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ dilations = config.dilations
+ bn_momentum = config.bn_momentum
+ self.inplanes = self.INIT_DIM
+ self.conv1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ NormType.BATCH_NORM,
+ self.inplanes,
+ D=self.D,
+ bn_momentum=bn_momentum,
+ )
+ self.relu = ME.MinkowskiReLU(inplace=True)
+ self.pool = sum_pool(
+ kernel_size=space_n_time_m(2, 1), stride=space_n_time_m(2, 1), D=D
+ )
+
+ self.layer1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[0], 1),
+ )
+ self.layer2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[1], 1),
+ )
+ self.layer3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[2], 1),
+ )
+ self.layer4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[3], 1),
+ )
+
+ self.final = conv(
+ self.PLANES[3] * self.BLOCK.expansion,
+ out_channels,
+ kernel_size=1,
+ bias=True,
+ D=D,
+ )
+
+ def weight_initialization(self):
+ for m in self.modules():
+ if isinstance(m, ME.MinkowskiBatchNorm):
+ nn.init.constant_(m.bn.weight, 1)
+ nn.init.constant_(m.bn.bias, 0)
+
+ def _make_layer(
+ self,
+ block,
+ planes,
+ blocks,
+ stride=1,
+ dilation=1,
+ norm_type=NormType.BATCH_NORM,
+ bn_momentum=0.1,
+ ):
+ downsample = None
+ if stride != 1 or self.inplanes != planes * block.expansion:
+ downsample = nn.Sequential(
+ conv(
+ self.inplanes,
+ planes * block.expansion,
+ kernel_size=1,
+ stride=stride,
+ bias=False,
+ D=self.D,
+ ),
+ get_norm(
+ norm_type,
+ planes * block.expansion,
+ D=self.D,
+ bn_momentum=bn_momentum,
+ ),
+ )
+ layers = []
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+ self.inplanes = planes * block.expansion
+ for i in range(1, blocks):
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=1,
+ dilation=dilation,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+
+ return nn.Sequential(*layers)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.bn1(x)
+ x = self.relu(x)
+ x = self.pool(x)
+
+ x = self.layer1(x)
+ x = self.layer2(x)
+ x = self.layer3(x)
+ x = self.layer4(x)
+
+ x = self.final(x)
+ return x
+
+
+class ResNet14(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1)
+
+
+class ResNet18(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2)
+
+
+class ResNet34(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet50(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet101(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 23, 3)
+
+
+class STResNetBase(ResNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STResNet14(STResNetBase, ResNet14):
+ pass
+
+
+class STResNet18(STResNetBase, ResNet18):
+ pass
+
+
+class STResNet34(STResNetBase, ResNet34):
+ pass
+
+
+class STResNet50(STResNetBase, ResNet50):
+ pass
+
+
+class STResNet101(STResNetBase, ResNet101):
+ pass
+
+
+class STResTesseractNetBase(STResNetBase):
+ CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseractNet14(STResTesseractNetBase, STResNet14):
+ pass
+
+
+class STResTesseractNet18(STResTesseractNetBase, STResNet18):
+ pass
+
+
+class STResTesseractNet34(STResTesseractNetBase, STResNet34):
+ pass
+
+
+class STResTesseractNet50(STResTesseractNetBase, STResNet50):
+ pass
+
+
+class STResTesseractNet101(STResTesseractNetBase, STResNet101):
+ pass
diff --git a/models/Mask3D/build/lib/mask3d/models/resunet.py b/models/Mask3D/build/lib/mask3d/models/resunet.py
new file mode 100644
index 0000000000000000000000000000000000000000..98a3adc56f09d534256960c080594e5df3a41c7c
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/resunet.py
@@ -0,0 +1,617 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.resnet import ResNetBase, get_norm
+from mask3d.models.modules.common import ConvType, NormType, conv, conv_tr
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck, BasicBlockINBN
+
+
+class MinkUNetBase(ResNetBase):
+ BLOCK = None
+ PLANES = (64, 128, 256, 512, 256, 128, 128)
+ DILATIONS = (1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2)
+ INIT_DIM = 64
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ super().__init__(in_channels, out_channels, config, D)
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv1p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.PLANES[0], D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr4p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr5p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr6p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+
+ self.final = nn.Sequential(
+ conv(
+ self.PLANES[6] + self.PLANES[0] * self.BLOCK.expansion,
+ 512,
+ kernel_size=1,
+ stride=1,
+ dilation=1,
+ bias=False,
+ D=D,
+ ),
+ ME.MinkowskiBatchNorm(512),
+ ME.MinkowskiReLU(),
+ conv(
+ 512,
+ out_channels,
+ kernel_size=1,
+ stride=1,
+ dilation=1,
+ bias=True,
+ D=D,
+ ),
+ )
+
+ def forward(self, x):
+ out = self.conv1p1s1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+
+ out_b1p1 = self.block1(out)
+
+ out = self.conv2p1s2(out_b1p1)
+ out = self.bn2(out)
+ out = self.relu(out)
+
+ out_b2p2 = self.block2(out)
+
+ out = self.conv3p2s2(out_b2p2)
+ out = self.bn3(out)
+ out = self.relu(out)
+
+ out_b3p4 = self.block3(out)
+
+ out = self.conv4p4s2(out_b3p4)
+ out = self.bn4(out)
+ out = self.relu(out)
+
+ # pixel_dist=8
+ out = self.block4(out)
+
+ out = self.convtr4p8s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p4)
+ out = self.block5(out)
+
+ out = self.convtr5p4s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p2)
+ out = self.block6(out)
+
+ out = self.convtr6p2s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p1)
+ return self.final(out)
+
+
+class ResUNet14(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1, 1, 1)
+
+
+class ResUNet18(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2, 2, 2)
+
+
+class ResUNet18INBN(ResUNet18):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+ BLOCK = BasicBlockINBN
+
+
+class ResUNet34(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (3, 4, 6, 3, 2, 2)
+
+
+class ResUNet50(MinkUNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 6, 3, 2, 2)
+
+
+class ResUNet101(MinkUNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 23, 3, 2, 2)
+
+
+class ResUNet14D(ResUNet14):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet18D(ResUNet18):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet34D(ResUNet34):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet34E(ResUNet34):
+ INIT_DIM = 32
+ PLANES = (32, 64, 128, 256, 128, 64, 64)
+
+
+class ResUNet34F(ResUNet34):
+ INIT_DIM = 32
+ PLANES = (32, 64, 128, 256, 128, 64, 32)
+
+
+class MinkUNetHyper(MinkUNetBase):
+ BLOCK = None
+ PLANES = (64, 128, 256, 512, 256, 128, 128)
+ DILATIONS = (1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2)
+ INIT_DIM = 64
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ super(MinkUNetBase, self).__init__(
+ in_channels, out_channels, config, D
+ )
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv1p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.PLANES[0], D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr4 = ME.MinkowskiPoolingTranspose(
+ kernel_size=8, stride=8, dimension=D
+ )
+ _ = self.inplanes
+ self.convtr4p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr5 = ME.MinkowskiPoolingTranspose(
+ kernel_size=4, stride=4, dimension=D
+ )
+ out_pool5 = self.inplanes
+ self.convtr5p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr6 = ME.MinkowskiPoolingTranspose(
+ kernel_size=2, stride=2, dimension=D
+ )
+ out_pool6 = self.inplanes
+ self.convtr6p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+
+ self.relu = MinkowskiReLU(inplace=True)
+
+ self.final = nn.Sequential(
+ conv(
+ out_pool5
+ + out_pool6
+ + self.PLANES[6]
+ + self.PLANES[0] * self.BLOCK.expansion,
+ 512,
+ kernel_size=1,
+ bias=False,
+ D=D,
+ ),
+ ME.MinkowskiBatchNorm(512),
+ ME.MinkowskiReLU(),
+ conv(512, out_channels, kernel_size=1, bias=True, D=D),
+ )
+
+ def forward(self, x):
+ out = self.conv1p1s1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+
+ out_b1p1 = self.block1(out)
+
+ out = self.conv2p1s2(out_b1p1)
+ out = self.bn2(out)
+ out = self.relu(out)
+
+ out_b2p2 = self.block2(out)
+
+ out = self.conv3p2s2(out_b2p2)
+ out = self.bn3(out)
+ out = self.relu(out)
+
+ out_b3p4 = self.block3(out)
+
+ out = self.conv4p4s2(out_b3p4)
+ out = self.bn4(out)
+ out = self.relu(out)
+
+ # pixel_dist=8
+ out = self.block4(out)
+
+ out = self.convtr4p8s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p4)
+ out = self.block5(out)
+ out_5 = self.pool_tr5(out)
+
+ out = self.convtr5p4s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p2)
+ out = self.block6(out)
+ out_6 = self.pool_tr6(out)
+
+ out = self.convtr6p2s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p1, out_6, out_5)
+ return self.final(out)
+
+
+class MinkUNetHyper14INBN(MinkUNetHyper):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+ BLOCK = BasicBlockINBN
+
+
+class STMinkUNetBase(MinkUNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STResUNet14(STMinkUNetBase, ResUNet14):
+ pass
+
+
+class STResUNet18(STMinkUNetBase, ResUNet18):
+ pass
+
+
+class STResUNet34(STMinkUNetBase, ResUNet34):
+ pass
+
+
+class STResUNet50(STMinkUNetBase, ResUNet50):
+ pass
+
+
+class STResUNet101(STMinkUNetBase, ResUNet101):
+ pass
+
+
+class STResTesseractUNetBase(STMinkUNetBase):
+ CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseractUNet14(STResTesseractUNetBase, ResUNet14):
+ pass
+
+
+class STResTesseractUNet18(STResTesseractUNetBase, ResUNet18):
+ pass
+
+
+class STResTesseractUNet34(STResTesseractUNetBase, ResUNet34):
+ pass
+
+
+class STResTesseractUNet50(STResTesseractUNetBase, ResUNet50):
+ pass
+
+
+class STResTesseractUNet101(STResTesseractUNetBase, ResUNet101):
+ pass
diff --git a/models/Mask3D/build/lib/mask3d/models/wrapper.py b/models/Mask3D/build/lib/mask3d/models/wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..a6bf1678d2106049b8e6a2ac2f3a9aff37dcfc9c
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/models/wrapper.py
@@ -0,0 +1,32 @@
+import random
+
+from torch.nn import Module
+from MinkowskiEngine import SparseTensor
+
+
+class Wrapper(Module):
+ """
+ Wrapper for the segmentation networks.
+ """
+
+ OUT_PIXEL_DIST = -1
+
+ def __init__(self, NetClass, in_nchannel, out_nchannel, config):
+ super().__init__()
+ self.initialize_filter(NetClass, in_nchannel, out_nchannel, config)
+
+ def initialize_filter(self, NetClass, in_nchannel, out_nchannel, config):
+ raise NotImplementedError("Must initialize a model and a filter")
+
+ def forward(self, x, coords, colors=None):
+ soutput = self.model(x)
+
+ # During training, make the network invariant to the filter
+ if not self.training or random.random() < 0.5:
+ # Filter requires the model to finish the forward pass
+ wrapper_coords = self.filter.initialize_coords(
+ self.model, coords, colors
+ )
+ finput = SparseTensor(soutput.F, wrapper_coords)
+ soutput = self.filter(finput)
+ return soutput
diff --git a/models/Mask3D/build/lib/mask3d/predict.py b/models/Mask3D/build/lib/mask3d/predict.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c085fd01897c13540da8eac9f941dcf0847ca6f
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/predict.py
@@ -0,0 +1,187 @@
+import hydra
+from omegaconf import DictConfig, OmegaConf
+from models.mask3d import Mask3D
+import os
+import torch
+
+import MinkowskiEngine as ME
+import open3d as o3d
+import numpy as np
+import albumentations as A
+
+from utils.utils import (
+ flatten_dict,
+ load_baseline_model,
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+
+from datasets.scannet200.scannet200_constants import (
+ SCANNET_COLOR_MAP_200,
+ SCANNET_COLOR_MAP_20,
+ VALID_CLASS_IDS_200,
+ VALID_CLASS_IDS_20,
+ CLASS_LABELS_200,
+ CLASS_LABELS_20,
+)
+
+root_dir = '/home/weders/scratch/scratch/scannetter/arkit/raw/Validation'
+
+class InstanceSegmentation(torch.nn.Module):
+ def __init__(self, cfg):
+ super().__init__()
+ self.model = hydra.utils.instantiate(cfg.model)
+
+
+ def forward(self, x, raw_coordinates=None):
+ return self.model(x, raw_coordinates=raw_coordinates)
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def main(cfg: DictConfig):
+
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ os.chdir(hydra.utils.get_original_cwd())
+ model = InstanceSegmentation(cfg)
+
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ model = model.to(device)
+ # model.eval()
+
+ color_mean = (0.47793125906962, 0.4303257521323044, 0.3749598901421883)
+ color_std = (0.2834475483823543, 0.27566157565723015, 0.27018971370874995)
+ normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+ # iterate over data
+ for sc in os.listdir(root_dir):
+
+
+ if not os.path.exists(os.path.join(root_dir, sc, 'mesh_tsdf.ply')):
+ continue
+
+ # save outputs
+ output_dir = os.path.join(root_dir, sc, 'pred_mask3d_ours')
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir)
+
+ if sc != '42445991':
+ continue
+
+ # if os.path.exists(os.path.join(output_dir, 'mask3d_predictions.txt')):
+ # print('Skipping', sc)
+ # continue
+
+ print('Processing', sc)
+
+ mesh = o3d.io.read_triangle_mesh(os.path.join(root_dir, sc, 'mesh_tsdf.ply'))
+ mesh.compute_vertex_normals()
+
+ points = np.asarray(mesh.vertices)
+ colors = np.asarray(mesh.vertex_colors)
+
+
+ colors = colors * 255.
+ pseudo_image = colors.astype(np.uint8)[np.newaxis, :, :]
+ colors = np.squeeze(normalize_color(image=pseudo_image)["image"])
+
+ # voxelize data
+ coords = np.floor(points / 0.02)
+
+ # maybe this change (_, _, ...) is not necessary and we can directly get out
+ # the sample coordinates?
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(coordinates=coords, features=colors, return_index=True, return_inverse=True)
+
+ sample_coordinates = coords[unique_map]
+ coordinates = [torch.from_numpy(sample_coordinates).int()]
+ sample_features = colors[unique_map]
+ features = [torch.from_numpy(sample_features).float()]
+
+ coordinates, _ = ME.utils.sparse_collate(coords=coordinates, feats=features)
+ features = torch.cat(features, dim=0)
+ data = ME.SparseTensor(
+ coordinates=coordinates,
+ features=features,
+ device=device,
+ )
+
+ # run model
+ with torch.no_grad():
+ outputs = model(data, raw_coordinates=features)
+
+ del data
+ torch.cuda.empty_cache()
+
+ # parse predictions
+ logits = outputs["pred_logits"]
+ masks = outputs["pred_masks"]
+
+
+ # reformat predictions
+ logits = logits[0].detach().cpu()
+ masks = masks[0].detach().cpu()
+
+ labels = []
+ confidences = []
+ masks_binary = []
+
+ for i in range(len(logits)):
+ p_labels = torch.softmax(logits[i], dim=-1)
+ p_masks = torch.sigmoid(masks[:, i])
+ l = torch.argmax(p_labels, dim=-1)
+ c_label = torch.max(p_labels)
+ m = p_masks > 0.5
+ c_m = p_masks[m].sum() / (m.sum() + 1e-8)
+ c = c_label * c_m
+ if l < 200 and c > 0.5:
+ labels.append(l.item())
+ confidences.append(c.item())
+ masks_binary.append(m[inverse_map]) # mapping the mask back to the original point cloud
+
+
+ # save labelled mesh
+ mesh_labelled = o3d.geometry.TriangleMesh()
+ mesh_labelled.vertices = mesh.vertices
+ mesh_labelled.triangles = mesh.triangles
+
+ labels_mapped = np.zeros((len(mesh.vertices), 1))
+ colors_mapped = np.zeros((len(mesh.vertices), 3))
+
+ confidences, labels, masks_binary = zip(*sorted(zip(confidences, labels, masks_binary), reverse=False))
+ for i, (l, c, m) in enumerate(zip(labels, confidences, masks_binary)):
+ labels_mapped[m == 1] = l
+ if l == 0:
+ l_ = -1 + 2 # label offset is 2 for scannet 200, 0 needs to be mapped to -1 before (see trainer.py in Mask3D)
+ else:
+ l_ = l + 2
+ # print(VALID_CLASS_IDS_200[l_], SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l_]], l_, CLASS_LABELS_200[l_])
+ colors_mapped[m == 1] = SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l_]]
+
+ # colors_mapped[mask_mapped == 1] = SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l]]
+
+
+
+
+ mesh_labelled.vertex_colors = o3d.utility.Vector3dVector(colors_mapped.astype(np.float32) / 255.)
+ o3d.io.write_triangle_mesh(f'{output_dir}/mesh_tsdf_labelled.ply', mesh_labelled)
+
+ mask_path = os.path.join(output_dir, 'pred_mask')
+ if not os.path.exists(mask_path):
+ os.makedirs(mask_path)
+
+ # sorting by confidence
+ with open(os.path.join(output_dir, 'mask3d_predictions.txt'), 'w') as f:
+ for i, (l, c, m) in enumerate(zip(labels, confidences, masks_binary)):
+ mask_file = f'pred_mask/{str(i).zfill(3)}.txt'
+ f.write(f'{mask_file} {VALID_CLASS_IDS_200[l]} {c}\n')
+ np.savetxt(os.path.join(output_dir, mask_file), m.numpy(), fmt='%d')
+
+
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff --git a/models/Mask3D/build/lib/mask3d/preprocess_arkitscenes.py b/models/Mask3D/build/lib/mask3d/preprocess_arkitscenes.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/trainer/__init__.py b/models/Mask3D/build/lib/mask3d/trainer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/trainer/trainer.py b/models/Mask3D/build/lib/mask3d/trainer/trainer.py
new file mode 100644
index 0000000000000000000000000000000000000000..b794e38aa5b2cef7eb106f95ced43466768b3dba
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/trainer/trainer.py
@@ -0,0 +1,1302 @@
+import gc
+from contextlib import nullcontext
+from pathlib import Path
+import statistics
+import shutil
+import os
+import math
+import pyviz3d.visualizer as vis
+from torch_scatter import scatter_mean
+import matplotlib
+from benchmark.evaluate_semantic_instance import evaluate
+from collections import defaultdict
+from sklearn.cluster import DBSCAN
+from utils.votenet_utils.eval_det import eval_det
+from datasets.scannet200.scannet200_splits import (
+ HEAD_CATS_SCANNET_200,
+ TAIL_CATS_SCANNET_200,
+ COMMON_CATS_SCANNET_200,
+ VALID_CLASS_IDS_200_VALIDATION,
+)
+
+import hydra
+import MinkowskiEngine as ME
+import numpy as np
+import pytorch_lightning as pl
+import torch
+from models.metrics import IoU
+import random
+import colorsys
+from typing import List, Tuple
+import functools
+
+
+@functools.lru_cache(20)
+def get_evenly_distributed_colors(
+ count: int,
+) -> List[Tuple[np.uint8, np.uint8, np.uint8]]:
+ # lru cache caches color tuples
+ HSV_tuples = [(x / count, 1.0, 1.0) for x in range(count)]
+ random.shuffle(HSV_tuples)
+ return list(
+ map(
+ lambda x: (np.array(colorsys.hsv_to_rgb(*x)) * 255).astype(
+ np.uint8
+ ),
+ HSV_tuples,
+ )
+ )
+
+
+class RegularCheckpointing(pl.Callback):
+ def on_train_epoch_end(
+ self, trainer: "pl.Trainer", pl_module: "pl.LightningModule"
+ ):
+ general = pl_module.config.general
+ trainer.save_checkpoint(f"{general.save_dir}/last-epoch.ckpt")
+ print("Checkpoint created")
+
+
+class InstanceSegmentation(pl.LightningModule):
+ def __init__(self, config):
+ super().__init__()
+
+ self.decoder_id = config.general.decoder_id
+
+ if config.model.train_on_segments:
+ self.mask_type = "segment_mask"
+ else:
+ self.mask_type = "masks"
+
+ self.eval_on_segments = config.general.eval_on_segments
+
+ self.config = config
+ self.save_hyperparameters()
+ # model
+ self.model = hydra.utils.instantiate(config.model)
+ self.optional_freeze = nullcontext
+ if config.general.freeze_backbone:
+ self.optional_freeze = torch.no_grad
+ # loss
+ self.ignore_label = config.data.ignore_label
+
+ matcher = hydra.utils.instantiate(config.matcher)
+ weight_dict = {
+ "loss_ce": matcher.cost_class,
+ "loss_mask": matcher.cost_mask,
+ "loss_dice": matcher.cost_dice,
+ }
+
+ aux_weight_dict = {}
+ for i in range(self.model.num_levels * self.model.num_decoders):
+ if i not in self.config.general.ignore_mask_idx:
+ aux_weight_dict.update(
+ {k + f"_{i}": v for k, v in weight_dict.items()}
+ )
+ else:
+ aux_weight_dict.update(
+ {k + f"_{i}": 0.0 for k, v in weight_dict.items()}
+ )
+ weight_dict.update(aux_weight_dict)
+
+ self.preds = dict()
+ self.bbox_preds = dict()
+ self.bbox_gt = dict()
+
+ self.criterion = hydra.utils.instantiate(
+ config.loss, matcher=matcher, weight_dict=weight_dict
+ )
+
+ # metrics
+ self.confusion = hydra.utils.instantiate(config.metrics)
+ self.iou = IoU()
+ # misc
+ self.labels_info = dict()
+
+ def forward(
+ self, x, point2segment=None, raw_coordinates=None, is_eval=False
+ ):
+ with self.optional_freeze():
+ x = self.model(
+ x,
+ point2segment,
+ raw_coordinates=raw_coordinates,
+ is_eval=is_eval,
+ )
+ return x
+
+ def training_step(self, batch, batch_idx):
+ data, target, file_names = batch
+
+ if data.features.shape[0] > self.config.general.max_batch_size:
+ print("data exceeds threshold")
+ raise RuntimeError("BATCH TOO BIG")
+
+ if len(target) == 0:
+ print("no targets")
+ return None
+
+ raw_coordinates = None
+ if self.config.data.add_raw_coordinates:
+ raw_coordinates = data.features[:, -3:]
+ data.features = data.features[:, :-3]
+
+ data = ME.SparseTensor(
+ coordinates=data.coordinates,
+ features=data.features,
+ device=self.device,
+ )
+
+ try:
+ output = self.forward(
+ data,
+ point2segment=[
+ target[i]["point2segment"] for i in range(len(target))
+ ],
+ raw_coordinates=raw_coordinates,
+ )
+ except RuntimeError as run_err:
+ print(run_err)
+ if (
+ "only a single point gives nans in cross-attention"
+ == run_err.args[0]
+ ):
+ return None
+ else:
+ raise run_err
+
+ try:
+ losses = self.criterion(output, target, mask_type=self.mask_type)
+ except ValueError as val_err:
+ print(f"ValueError: {val_err}")
+ print(f"data shape: {data.shape}")
+ print(f"data feat shape: {data.features.shape}")
+ print(f"data feat nans: {data.features.isnan().sum()}")
+ print(f"output: {output}")
+ print(f"target: {target}")
+ print(f"filenames: {file_names}")
+ raise val_err
+
+ for k in list(losses.keys()):
+ if k in self.criterion.weight_dict:
+ losses[k] *= self.criterion.weight_dict[k]
+ else:
+ # remove this loss if not specified in `weight_dict`
+ losses.pop(k)
+
+ logs = {
+ f"train_{k}": v.detach().cpu().item() for k, v in losses.items()
+ }
+
+ logs["train_mean_loss_ce"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_ce" in k]]
+ )
+
+ logs["train_mean_loss_mask"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_mask" in k]]
+ )
+
+ logs["train_mean_loss_dice"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_dice" in k]]
+ )
+
+ self.log_dict(logs)
+ return sum(losses.values())
+
+ def validation_step(self, batch, batch_idx):
+ return self.eval_step(batch, batch_idx)
+
+ def export(self, pred_masks, scores, pred_classes, file_names, decoder_id):
+ root_path = f"eval_output"
+ base_path = f"{root_path}/instance_evaluation_{self.config.general.experiment_name}_{self.current_epoch}/decoder_{decoder_id}"
+ pred_mask_path = f"{base_path}/pred_mask"
+
+ Path(pred_mask_path).mkdir(parents=True, exist_ok=True)
+
+ file_name = file_names
+ with open(f"{base_path}/{file_name}.txt", "w") as fout:
+ real_id = -1
+ for instance_id in range(len(pred_classes)):
+ real_id += 1
+ pred_class = pred_classes[instance_id]
+ score = scores[instance_id]
+ mask = pred_masks[:, instance_id].astype("uint8")
+
+ if score > self.config.general.export_threshold:
+ # reduce the export size a bit. I guess no performance difference
+ np.savetxt(
+ f"{pred_mask_path}/{file_name}_{real_id}.txt",
+ mask,
+ fmt="%d",
+ )
+ fout.write(
+ f"pred_mask/{file_name}_{real_id}.txt {pred_class} {score}\n"
+ )
+
+ def training_epoch_end(self, outputs):
+ train_loss = sum([out["loss"].cpu().item() for out in outputs]) / len(
+ outputs
+ )
+ results = {"train_loss_mean": train_loss}
+ self.log_dict(results)
+
+ def validation_epoch_end(self, outputs):
+ self.test_epoch_end(outputs)
+
+ def save_visualizations(
+ self,
+ target_full,
+ full_res_coords,
+ sorted_masks,
+ sort_classes,
+ file_name,
+ original_colors,
+ original_normals,
+ sort_scores_values,
+ point_size=20,
+ sorted_heatmaps=None,
+ query_pos=None,
+ backbone_features=None,
+ ):
+
+ full_res_coords -= full_res_coords.mean(axis=0)
+
+ gt_pcd_pos = []
+ gt_pcd_normals = []
+ gt_pcd_color = []
+ gt_inst_pcd_color = []
+ gt_boxes = []
+
+ if "labels" in target_full:
+ instances_colors = torch.from_numpy(
+ np.vstack(
+ get_evenly_distributed_colors(
+ target_full["labels"].shape[0]
+ )
+ )
+ )
+ for instance_counter, (label, mask) in enumerate(
+ zip(target_full["labels"], target_full["masks"])
+ ):
+ if label == 255:
+ continue
+
+ mask_tmp = mask.detach().cpu().numpy()
+ mask_coords = full_res_coords[mask_tmp.astype(bool), :]
+
+ if len(mask_coords) == 0:
+ continue
+
+ gt_pcd_pos.append(mask_coords)
+ mask_coords_min = full_res_coords[
+ mask_tmp.astype(bool), :
+ ].min(axis=0)
+ mask_coords_max = full_res_coords[
+ mask_tmp.astype(bool), :
+ ].max(axis=0)
+ size = mask_coords_max - mask_coords_min
+ mask_coords_middle = mask_coords_min + size / 2
+
+ gt_boxes.append(
+ {
+ "position": mask_coords_middle,
+ "size": size,
+ "color": self.validation_dataset.map2color([label])[0],
+ }
+ )
+
+ gt_pcd_color.append(
+ self.validation_dataset.map2color([label]).repeat(
+ gt_pcd_pos[-1].shape[0], 1
+ )
+ )
+ gt_inst_pcd_color.append(
+ instances_colors[instance_counter % len(instances_colors)]
+ .unsqueeze(0)
+ .repeat(gt_pcd_pos[-1].shape[0], 1)
+ )
+
+ gt_pcd_normals.append(
+ original_normals[mask_tmp.astype(bool), :]
+ )
+
+ gt_pcd_pos = np.concatenate(gt_pcd_pos)
+ gt_pcd_normals = np.concatenate(gt_pcd_normals)
+ gt_pcd_color = np.concatenate(gt_pcd_color)
+ gt_inst_pcd_color = np.concatenate(gt_inst_pcd_color)
+
+ v = vis.Visualizer()
+
+ v.add_points(
+ "RGB Input",
+ full_res_coords,
+ colors=original_colors,
+ normals=original_normals,
+ visible=True,
+ point_size=point_size,
+ )
+
+ if backbone_features is not None:
+ v.add_points(
+ "PCA",
+ full_res_coords,
+ colors=backbone_features,
+ normals=original_normals,
+ visible=False,
+ point_size=point_size,
+ )
+
+ if "labels" in target_full:
+ v.add_points(
+ "Semantics (GT)",
+ gt_pcd_pos,
+ colors=gt_pcd_color,
+ normals=gt_pcd_normals,
+ alpha=0.8,
+ visible=False,
+ point_size=point_size,
+ )
+ v.add_points(
+ "Instances (GT)",
+ gt_pcd_pos,
+ colors=gt_inst_pcd_color,
+ normals=gt_pcd_normals,
+ alpha=0.8,
+ visible=False,
+ point_size=point_size,
+ )
+
+ pred_coords = []
+ pred_normals = []
+ pred_sem_color = []
+ pred_inst_color = []
+
+ for did in range(len(sorted_masks)):
+ instances_colors = torch.from_numpy(
+ np.vstack(
+ get_evenly_distributed_colors(
+ max(1, sorted_masks[did].shape[1])
+ )
+ )
+ )
+
+ for i in reversed(range(sorted_masks[did].shape[1])):
+ coords = full_res_coords[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+
+ mask_coords = full_res_coords[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+ mask_normals = original_normals[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+
+ label = sort_classes[did][i]
+
+ if len(mask_coords) == 0:
+ continue
+
+ pred_coords.append(mask_coords)
+ pred_normals.append(mask_normals)
+
+ pred_sem_color.append(
+ self.validation_dataset.map2color([label]).repeat(
+ mask_coords.shape[0], 1
+ )
+ )
+
+ pred_inst_color.append(
+ instances_colors[i % len(instances_colors)]
+ .unsqueeze(0)
+ .repeat(mask_coords.shape[0], 1)
+ )
+
+ if len(pred_coords) > 0:
+ pred_coords = np.concatenate(pred_coords)
+ pred_normals = np.concatenate(pred_normals)
+ pred_sem_color = np.concatenate(pred_sem_color)
+ pred_inst_color = np.concatenate(pred_inst_color)
+
+ v.add_points(
+ "Semantics (Mask3D)",
+ pred_coords,
+ colors=pred_sem_color,
+ normals=pred_normals,
+ visible=False,
+ alpha=0.8,
+ point_size=point_size,
+ )
+ v.add_points(
+ "Instances (Mask3D)",
+ pred_coords,
+ colors=pred_inst_color,
+ normals=pred_normals,
+ visible=False,
+ alpha=0.8,
+ point_size=point_size,
+ )
+
+ v.save(
+ f"{self.config['general']['save_dir']}/visualizations/{file_name}"
+ )
+
+ def eval_step(self, batch, batch_idx):
+ data, target, file_names = batch
+ inverse_maps = data.inverse_maps
+ target_full = data.target_full
+ original_colors = data.original_colors
+ data_idx = data.idx
+ original_normals = data.original_normals
+ original_coordinates = data.original_coordinates
+
+ # if len(target) == 0 or len(target_full) == 0:
+ # print("no targets")
+ # return None
+
+ if len(data.coordinates) == 0:
+ return 0.0
+
+ raw_coordinates = None
+ if self.config.data.add_raw_coordinates:
+ raw_coordinates = data.features[:, -3:]
+ data.features = data.features[:, :-3]
+
+ if raw_coordinates.shape[0] == 0:
+ return 0.0
+
+ data = ME.SparseTensor(
+ coordinates=data.coordinates,
+ features=data.features,
+ device=self.device,
+ )
+
+ try:
+ output = self.forward(
+ data,
+ point2segment=[
+ target[i]["point2segment"] for i in range(len(target))
+ ],
+ raw_coordinates=raw_coordinates,
+ is_eval=True,
+ )
+ except RuntimeError as run_err:
+ print(run_err)
+ if (
+ "only a single point gives nans in cross-attention"
+ == run_err.args[0]
+ ):
+ return None
+ else:
+ raise run_err
+
+ if self.config.data.test_mode != "test":
+ if self.config.trainer.deterministic:
+ torch.use_deterministic_algorithms(False)
+
+ try:
+ losses = self.criterion(
+ output, target, mask_type=self.mask_type
+ )
+ except ValueError as val_err:
+ print(f"ValueError: {val_err}")
+ print(f"data shape: {data.shape}")
+ print(f"data feat shape: {data.features.shape}")
+ print(f"data feat nans: {data.features.isnan().sum()}")
+ print(f"output: {output}")
+ print(f"target: {target}")
+ print(f"filenames: {file_names}")
+ raise val_err
+
+ for k in list(losses.keys()):
+ if k in self.criterion.weight_dict:
+ losses[k] *= self.criterion.weight_dict[k]
+ else:
+ # remove this loss if not specified in `weight_dict`
+ losses.pop(k)
+ if self.config.trainer.deterministic:
+ torch.use_deterministic_algorithms(True)
+
+ if self.config.general.save_visualizations:
+ backbone_features = (
+ output["backbone_features"].F.detach().cpu().numpy()
+ )
+ from sklearn import decomposition
+
+ pca = decomposition.PCA(n_components=3)
+ pca.fit(backbone_features)
+ pca_features = pca.transform(backbone_features)
+ rescaled_pca = (
+ 255
+ * (pca_features - pca_features.min())
+ / (pca_features.max() - pca_features.min())
+ )
+
+ self.eval_instance_step(
+ output,
+ target,
+ target_full,
+ inverse_maps,
+ file_names,
+ original_coordinates,
+ original_colors,
+ original_normals,
+ raw_coordinates,
+ data_idx,
+ backbone_features=rescaled_pca
+ if self.config.general.save_visualizations
+ else None,
+ )
+
+ if self.config.data.test_mode != "test":
+ return {
+ f"val_{k}": v.detach().cpu().item() for k, v in losses.items()
+ }
+ else:
+ return 0.0
+
+ def test_step(self, batch, batch_idx):
+ return self.eval_step(batch, batch_idx)
+
+ def get_full_res_mask(
+ self, mask, inverse_map, point2segment_full, is_heatmap=False
+ ):
+ mask = mask.detach().cpu()[inverse_map] # full res
+
+ if self.eval_on_segments and is_heatmap == False:
+ mask = scatter_mean(
+ mask, point2segment_full, dim=0
+ ) # full res segments
+ mask = (mask > 0.5).float()
+ mask = mask.detach().cpu()[
+ point2segment_full.cpu()
+ ] # full res points
+
+ return mask
+
+ def get_mask_and_scores(
+ self, mask_cls, mask_pred, num_queries=100, num_classes=18, device=None
+ ):
+ if device is None:
+ device = self.device
+ labels = (
+ torch.arange(num_classes, device=device)
+ .unsqueeze(0)
+ .repeat(num_queries, 1)
+ .flatten(0, 1)
+ )
+
+ if self.config.general.topk_per_image != -1:
+ scores_per_query, topk_indices = mask_cls.flatten(0, 1).topk(
+ self.config.general.topk_per_image, sorted=True
+ )
+ else:
+ scores_per_query, topk_indices = mask_cls.flatten(0, 1).topk(
+ num_queries, sorted=True
+ )
+
+ labels_per_query = labels[topk_indices]
+ topk_indices = topk_indices // num_classes
+ mask_pred = mask_pred[:, topk_indices]
+
+ result_pred_mask = (mask_pred > 0).float()
+ heatmap = mask_pred.float().sigmoid()
+
+ mask_scores_per_image = (heatmap * result_pred_mask).sum(0) / (
+ result_pred_mask.sum(0) + 1e-6
+ )
+ score = scores_per_query * mask_scores_per_image
+ classes = labels_per_query
+
+ return score, result_pred_mask, classes, heatmap
+
+ def eval_instance_step(
+ self,
+ output,
+ target_low_res,
+ target_full_res,
+ inverse_maps,
+ file_names,
+ full_res_coords,
+ original_colors,
+ original_normals,
+ raw_coords,
+ idx,
+ first_full_res=False,
+ backbone_features=None,
+ ):
+ label_offset = self.validation_dataset.label_offset
+ prediction = output["aux_outputs"]
+ prediction.append(
+ {
+ "pred_logits": output["pred_logits"],
+ "pred_masks": output["pred_masks"],
+ }
+ )
+
+ prediction[self.decoder_id][
+ "pred_logits"
+ ] = torch.functional.F.softmax(
+ prediction[self.decoder_id]["pred_logits"], dim=-1
+ )[
+ ..., :-1
+ ]
+
+ all_pred_classes = list()
+ all_pred_masks = list()
+ all_pred_scores = list()
+ all_heatmaps = list()
+ all_query_pos = list()
+
+ offset_coords_idx = 0
+ for bid in range(len(prediction[self.decoder_id]["pred_masks"])):
+ if not first_full_res:
+ if self.model.train_on_segments:
+ masks = (
+ prediction[self.decoder_id]["pred_masks"][bid]
+ .detach()
+ .cpu()[target_low_res[bid]["point2segment"].cpu()]
+ )
+ else:
+ masks = (
+ prediction[self.decoder_id]["pred_masks"][bid]
+ .detach()
+ .cpu()
+ )
+
+ if self.config.general.use_dbscan:
+ new_preds = {
+ "pred_masks": list(),
+ "pred_logits": list(),
+ }
+
+ curr_coords_idx = masks.shape[0]
+ curr_coords = raw_coords[
+ offset_coords_idx : curr_coords_idx + offset_coords_idx
+ ]
+ offset_coords_idx += curr_coords_idx
+
+ for curr_query in range(masks.shape[1]):
+ curr_masks = masks[:, curr_query] > 0
+
+ if curr_coords[curr_masks].shape[0] > 0:
+ clusters = (
+ DBSCAN(
+ eps=self.config.general.dbscan_eps,
+ min_samples=self.config.general.dbscan_min_points,
+ n_jobs=-1,
+ )
+ .fit(curr_coords[curr_masks])
+ .labels_
+ )
+
+ new_mask = torch.zeros(curr_masks.shape, dtype=int)
+ new_mask[curr_masks] = (
+ torch.from_numpy(clusters) + 1
+ )
+
+ for cluster_id in np.unique(clusters):
+ original_pred_masks = masks[:, curr_query]
+ if cluster_id != -1:
+ new_preds["pred_masks"].append(
+ original_pred_masks
+ * (new_mask == cluster_id + 1)
+ )
+ new_preds["pred_logits"].append(
+ prediction[self.decoder_id][
+ "pred_logits"
+ ][bid, curr_query]
+ )
+
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ torch.stack(new_preds["pred_logits"]).cpu(),
+ torch.stack(new_preds["pred_masks"]).T,
+ len(new_preds["pred_logits"]),
+ self.model.num_classes - 1,
+ )
+ else:
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ prediction[self.decoder_id]["pred_logits"][bid]
+ .detach()
+ .cpu(),
+ masks,
+ prediction[self.decoder_id]["pred_logits"][bid].shape[
+ 0
+ ],
+ self.model.num_classes - 1,
+ )
+
+ masks = self.get_full_res_mask(
+ masks,
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ )
+
+ heatmap = self.get_full_res_mask(
+ heatmap,
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ is_heatmap=True,
+ )
+
+ if backbone_features is not None:
+ backbone_features = self.get_full_res_mask(
+ torch.from_numpy(backbone_features),
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ is_heatmap=True,
+ )
+ backbone_features = backbone_features.numpy()
+ else:
+ assert False, "not tested"
+ masks = self.get_full_res_mask(
+ prediction[self.decoder_id]["pred_masks"][bid].cpu(),
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ )
+
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ prediction[self.decoder_id]["pred_logits"][bid].cpu(),
+ masks,
+ prediction[self.decoder_id]["pred_logits"][bid].shape[0],
+ self.model.num_classes - 1,
+ device="cpu",
+ )
+
+ masks = masks.numpy()
+ heatmap = heatmap.numpy()
+
+ sort_scores = scores.sort(descending=True)
+ sort_scores_index = sort_scores.indices.cpu().numpy()
+ sort_scores_values = sort_scores.values.cpu().numpy()
+ sort_classes = classes[sort_scores_index]
+
+ sorted_masks = masks[:, sort_scores_index]
+ sorted_heatmap = heatmap[:, sort_scores_index]
+
+ if self.config.general.filter_out_instances:
+ keep_instances = set()
+ pairwise_overlap = sorted_masks.T @ sorted_masks
+ normalization = pairwise_overlap.max(axis=0)
+ norm_overlaps = pairwise_overlap / normalization
+
+ for instance_id in range(norm_overlaps.shape[0]):
+ # filter out unlikely masks and nearly empty masks
+ # if not(sort_scores_values[instance_id] < 0.3 or sorted_masks[:, instance_id].sum() < 500):
+ if not (
+ sort_scores_values[instance_id]
+ < self.config.general.scores_threshold
+ ):
+ # check if mask != empty
+ if not sorted_masks[:, instance_id].sum() == 0.0:
+ overlap_ids = set(
+ np.nonzero(
+ norm_overlaps[instance_id, :]
+ > self.config.general.iou_threshold
+ )[0]
+ )
+
+ if len(overlap_ids) == 0:
+ keep_instances.add(instance_id)
+ else:
+ if instance_id == min(overlap_ids):
+ keep_instances.add(instance_id)
+
+ keep_instances = sorted(list(keep_instances))
+ all_pred_classes.append(sort_classes[keep_instances])
+ all_pred_masks.append(sorted_masks[:, keep_instances])
+ all_pred_scores.append(sort_scores_values[keep_instances])
+ all_heatmaps.append(sorted_heatmap[:, keep_instances])
+ else:
+ all_pred_classes.append(sort_classes)
+ all_pred_masks.append(sorted_masks)
+ all_pred_scores.append(sort_scores_values)
+ all_heatmaps.append(sorted_heatmap)
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ all_pred_classes[bid][all_pred_classes[bid] == 0] = -1
+ if self.config.data.test_mode != "test":
+ target_full_res[bid]["labels"][
+ target_full_res[bid]["labels"] == 0
+ ] = -1
+
+ for bid in range(len(prediction[self.decoder_id]["pred_masks"])):
+ all_pred_classes[
+ bid
+ ] = self.validation_dataset._remap_model_output(
+ all_pred_classes[bid].cpu() + label_offset
+ )
+
+ if (
+ self.config.data.test_mode != "test"
+ and len(target_full_res) != 0
+ ):
+ target_full_res[bid][
+ "labels"
+ ] = self.validation_dataset._remap_model_output(
+ target_full_res[bid]["labels"].cpu() + label_offset
+ )
+
+ # PREDICTION BOX
+ bbox_data = []
+ for query_id in range(
+ all_pred_masks[bid].shape[1]
+ ): # self.model.num_queries
+ obj_coords = full_res_coords[bid][
+ all_pred_masks[bid][:, query_id].astype(bool), :
+ ]
+ if obj_coords.shape[0] > 0:
+ obj_center = obj_coords.mean(axis=0)
+ obj_axis_length = obj_coords.max(
+ axis=0
+ ) - obj_coords.min(axis=0)
+
+ bbox = np.concatenate((obj_center, obj_axis_length))
+
+ bbox_data.append(
+ (
+ all_pred_classes[bid][query_id].item(),
+ bbox,
+ all_pred_scores[bid][query_id],
+ )
+ )
+ self.bbox_preds[file_names[bid]] = bbox_data
+
+ # GT BOX
+ bbox_data = []
+ for obj_id in range(target_full_res[bid]["masks"].shape[0]):
+ if target_full_res[bid]["labels"][obj_id].item() == 255:
+ continue
+
+ obj_coords = full_res_coords[bid][
+ target_full_res[bid]["masks"][obj_id, :]
+ .cpu()
+ .detach()
+ .numpy()
+ .astype(bool),
+ :,
+ ]
+ if obj_coords.shape[0] > 0:
+ obj_center = obj_coords.mean(axis=0)
+ obj_axis_length = obj_coords.max(
+ axis=0
+ ) - obj_coords.min(axis=0)
+
+ bbox = np.concatenate((obj_center, obj_axis_length))
+ bbox_data.append(
+ (
+ target_full_res[bid]["labels"][obj_id].item(),
+ bbox,
+ )
+ )
+
+ self.bbox_gt[file_names[bid]] = bbox_data
+
+ if self.config.general.eval_inner_core == -1:
+ self.preds[file_names[bid]] = {
+ "pred_masks": all_pred_masks[bid],
+ "pred_scores": all_pred_scores[bid],
+ "pred_classes": all_pred_classes[bid],
+ }
+ else:
+ # prev val_dataset
+ self.preds[file_names[bid]] = {
+ "pred_masks": all_pred_masks[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ "pred_scores": all_pred_scores[bid],
+ "pred_classes": all_pred_classes[bid],
+ }
+
+ if self.config.general.save_visualizations:
+ if "cond_inner" in self.test_dataset.data[idx[bid]]:
+ target_full_res[bid]["masks"] = target_full_res[bid][
+ "masks"
+ ][:, self.test_dataset.data[idx[bid]]["cond_inner"]]
+ self.save_visualizations(
+ target_full_res[bid],
+ full_res_coords[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ [self.preds[file_names[bid]]["pred_masks"]],
+ [self.preds[file_names[bid]]["pred_classes"]],
+ file_names[bid],
+ original_colors[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ original_normals[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ [self.preds[file_names[bid]]["pred_scores"]],
+ sorted_heatmaps=[
+ all_heatmaps[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ]
+ ],
+ query_pos=all_query_pos[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ]
+ if len(all_query_pos) > 0
+ else None,
+ backbone_features=backbone_features[
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ point_size=self.config.general.visualization_point_size,
+ )
+ else:
+ self.save_visualizations(
+ target_full_res[bid],
+ full_res_coords[bid],
+ [self.preds[file_names[bid]]["pred_masks"]],
+ [self.preds[file_names[bid]]["pred_classes"]],
+ file_names[bid],
+ original_colors[bid],
+ original_normals[bid],
+ [self.preds[file_names[bid]]["pred_scores"]],
+ sorted_heatmaps=[all_heatmaps[bid]],
+ query_pos=all_query_pos[bid]
+ if len(all_query_pos) > 0
+ else None,
+ backbone_features=backbone_features,
+ point_size=self.config.general.visualization_point_size,
+ )
+
+ if self.config.general.export:
+ if self.validation_dataset.dataset_name == "stpls3d":
+ scan_id, _, _, crop_id = file_names[bid].split("_")
+ crop_id = int(crop_id.replace(".txt", ""))
+ file_name = (
+ f"{scan_id}_points_GTv3_0{crop_id}_inst_nostuff"
+ )
+
+ self.export(
+ self.preds[file_names[bid]]["pred_masks"],
+ self.preds[file_names[bid]]["pred_scores"],
+ self.preds[file_names[bid]]["pred_classes"],
+ file_name,
+ self.decoder_id,
+ )
+ else:
+ self.export(
+ self.preds[file_names[bid]]["pred_masks"],
+ self.preds[file_names[bid]]["pred_scores"],
+ self.preds[file_names[bid]]["pred_classes"],
+ file_names[bid],
+ self.decoder_id,
+ )
+
+ def eval_instance_epoch_end(self):
+ log_prefix = f"val"
+ ap_results = {}
+
+ head_results, tail_results, common_results = [], [], []
+
+ box_ap_50 = eval_det(
+ self.bbox_preds, self.bbox_gt, ovthresh=0.5, use_07_metric=False
+ )
+ box_ap_25 = eval_det(
+ self.bbox_preds, self.bbox_gt, ovthresh=0.25, use_07_metric=False
+ )
+ mean_box_ap_25 = sum([v for k, v in box_ap_25[-1].items()]) / len(
+ box_ap_25[-1].keys()
+ )
+ mean_box_ap_50 = sum([v for k, v in box_ap_50[-1].items()]) / len(
+ box_ap_50[-1].keys()
+ )
+
+ ap_results[f"{log_prefix}_mean_box_ap_25"] = mean_box_ap_25
+ ap_results[f"{log_prefix}_mean_box_ap_50"] = mean_box_ap_50
+
+ for class_id in box_ap_50[-1].keys():
+ class_name = self.train_dataset.label_info[class_id]["name"]
+ ap_results[f"{log_prefix}_{class_name}_val_box_ap_50"] = box_ap_50[
+ -1
+ ][class_id]
+
+ for class_id in box_ap_25[-1].keys():
+ class_name = self.train_dataset.label_info[class_id]["name"]
+ ap_results[f"{log_prefix}_{class_name}_val_box_ap_25"] = box_ap_25[
+ -1
+ ][class_id]
+
+ root_path = f"eval_output"
+ base_path = f"{root_path}/instance_evaluation_{self.config.general.experiment_name}_{self.current_epoch}"
+
+ if self.validation_dataset.dataset_name in [
+ "scannet",
+ "stpls3d",
+ "scannet200",
+ ]:
+ gt_data_path = f"{self.validation_dataset.data_dir[0]}/instance_gt/{self.validation_dataset.mode}"
+ else:
+ gt_data_path = f"{self.validation_dataset.data_dir[0]}/instance_gt/Area_{self.config.general.area}"
+
+ pred_path = f"{base_path}/tmp_output.txt"
+
+ log_prefix = f"val"
+
+ if not os.path.exists(base_path):
+ os.makedirs(base_path)
+
+ try:
+ if self.validation_dataset.dataset_name == "s3dis":
+ new_preds = {}
+ for key in self.preds.keys():
+ new_preds[
+ key.replace(f"Area_{self.config.general.area}_", "")
+ ] = {
+ "pred_classes": self.preds[key]["pred_classes"] + 1,
+ "pred_masks": self.preds[key]["pred_masks"],
+ "pred_scores": self.preds[key]["pred_scores"],
+ }
+ mprec, mrec = evaluate(
+ new_preds, gt_data_path, pred_path, dataset="s3dis"
+ )
+ ap_results[f"{log_prefix}_mean_precision"] = mprec
+ ap_results[f"{log_prefix}_mean_recall"] = mrec
+ elif self.validation_dataset.dataset_name == "stpls3d":
+ new_preds = {}
+ for key in self.preds.keys():
+ new_preds[key.replace(".txt", "")] = {
+ "pred_classes": self.preds[key]["pred_classes"],
+ "pred_masks": self.preds[key]["pred_masks"],
+ "pred_scores": self.preds[key]["pred_scores"],
+ }
+
+ evaluate(new_preds, gt_data_path, pred_path, dataset="stpls3d")
+ else:
+ evaluate(
+ self.preds,
+ gt_data_path,
+ pred_path,
+ dataset=self.validation_dataset.dataset_name,
+ )
+ with open(pred_path, "r") as fin:
+ for line_id, line in enumerate(fin):
+ if line_id == 0:
+ # ignore header
+ continue
+ class_name, _, ap, ap_50, ap_25 = line.strip().split(",")
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ if class_name in VALID_CLASS_IDS_200_VALIDATION:
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap"
+ ] = float(ap)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_50"
+ ] = float(ap_50)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_25"
+ ] = float(ap_25)
+
+ if class_name in HEAD_CATS_SCANNET_200:
+ head_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ elif class_name in COMMON_CATS_SCANNET_200:
+ common_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ elif class_name in TAIL_CATS_SCANNET_200:
+ tail_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ else:
+ assert (False, "class not known!")
+ else:
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap"
+ ] = float(ap)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_50"
+ ] = float(ap_50)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_25"
+ ] = float(ap_25)
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ head_results = np.stack(head_results)
+ common_results = np.stack(common_results)
+ tail_results = np.stack(tail_results)
+
+ mean_tail_results = np.nanmean(tail_results, axis=0)
+ mean_common_results = np.nanmean(common_results, axis=0)
+ mean_head_results = np.nanmean(head_results, axis=0)
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_25"
+ ] = mean_tail_results[0]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_25"
+ ] = mean_common_results[0]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_25"
+ ] = mean_head_results[0]
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_50"
+ ] = mean_tail_results[1]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_50"
+ ] = mean_common_results[1]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_50"
+ ] = mean_head_results[1]
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_25"
+ ] = mean_tail_results[2]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_25"
+ ] = mean_common_results[2]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_25"
+ ] = mean_head_results[2]
+
+ overall_ap_results = np.nanmean(
+ np.vstack((head_results, common_results, tail_results)),
+ axis=0,
+ )
+
+ ap_results[f"{log_prefix}_mean_ap"] = overall_ap_results[0]
+ ap_results[f"{log_prefix}_mean_ap_50"] = overall_ap_results[1]
+ ap_results[f"{log_prefix}_mean_ap_25"] = overall_ap_results[2]
+
+ ap_results = {
+ key: 0.0 if math.isnan(score) else score
+ for key, score in ap_results.items()
+ }
+ else:
+ mean_ap = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap")
+ ]
+ )
+ mean_ap_50 = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap_50")
+ ]
+ )
+ mean_ap_25 = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap_25")
+ ]
+ )
+
+ ap_results[f"{log_prefix}_mean_ap"] = mean_ap
+ ap_results[f"{log_prefix}_mean_ap_50"] = mean_ap_50
+ ap_results[f"{log_prefix}_mean_ap_25"] = mean_ap_25
+
+ ap_results = {
+ key: 0.0 if math.isnan(score) else score
+ for key, score in ap_results.items()
+ }
+ except (IndexError, OSError) as e:
+ print("NO SCORES!!!")
+ ap_results[f"{log_prefix}_mean_ap"] = 0.0
+ ap_results[f"{log_prefix}_mean_ap_50"] = 0.0
+ ap_results[f"{log_prefix}_mean_ap_25"] = 0.0
+
+ self.log_dict(ap_results)
+
+ if not self.config.general.export:
+ shutil.rmtree(base_path)
+
+ del self.preds
+ del self.bbox_preds
+ del self.bbox_gt
+
+ gc.collect()
+
+ self.preds = dict()
+ self.bbox_preds = dict()
+ self.bbox_gt = dict()
+
+ def test_epoch_end(self, outputs):
+ if self.config.general.export:
+ return
+
+ self.eval_instance_epoch_end()
+
+ dd = defaultdict(list)
+ for output in outputs:
+ for key, val in output.items(): # .items() in Python 3.
+ dd[key].append(val)
+
+ dd = {k: statistics.mean(v) for k, v in dd.items()}
+
+ dd["val_mean_loss_ce"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_ce" in k]]
+ )
+ dd["val_mean_loss_mask"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_mask" in k]]
+ )
+ dd["val_mean_loss_dice"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_dice" in k]]
+ )
+
+ self.log_dict(dd)
+
+ def configure_optimizers(self):
+ optimizer = hydra.utils.instantiate(
+ self.config.optimizer, params=self.parameters()
+ )
+ if "steps_per_epoch" in self.config.scheduler.scheduler.keys():
+ self.config.scheduler.scheduler.steps_per_epoch = len(
+ self.train_dataloader()
+ )
+ lr_scheduler = hydra.utils.instantiate(
+ self.config.scheduler.scheduler, optimizer=optimizer
+ )
+ scheduler_config = {"scheduler": lr_scheduler}
+ scheduler_config.update(self.config.scheduler.pytorch_lightning_params)
+ return [optimizer], [scheduler_config]
+
+ def prepare_data(self):
+ self.train_dataset = hydra.utils.instantiate(
+ self.config.data.train_dataset
+ )
+ self.validation_dataset = hydra.utils.instantiate(
+ self.config.data.validation_dataset
+ )
+ self.test_dataset = hydra.utils.instantiate(
+ self.config.data.test_dataset
+ )
+ self.labels_info = self.train_dataset.label_info
+
+ def train_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.train_collation)
+ return hydra.utils.instantiate(
+ self.config.data.train_dataloader,
+ self.train_dataset,
+ collate_fn=c_fn,
+ )
+
+ def val_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.validation_collation)
+ return hydra.utils.instantiate(
+ self.config.data.validation_dataloader,
+ self.validation_dataset,
+ collate_fn=c_fn,
+ )
+
+ def test_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.test_collation)
+ return hydra.utils.instantiate(
+ self.config.data.test_dataloader,
+ self.test_dataset,
+ collate_fn=c_fn,
+ )
diff --git a/models/Mask3D/build/lib/mask3d/utils/__init__.py b/models/Mask3D/build/lib/mask3d/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/build/lib/mask3d/utils/gradflow_check.py b/models/Mask3D/build/lib/mask3d/utils/gradflow_check.py
new file mode 100644
index 0000000000000000000000000000000000000000..2fedc91592d66d4e5bdef7531daafccc5b5f2e81
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/utils/gradflow_check.py
@@ -0,0 +1,62 @@
+""" https://github.com/alwynmathew/gradflow-check """
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.lines import Line2D
+
+
+def plot_grad_flow(named_parameters):
+ ave_grads = []
+ layers = []
+ for n, p in named_parameters:
+ if (p.requires_grad) and ("bias" not in n):
+ if p.grad:
+ layers.append(n)
+ ave_grads.append(p.grad.abs().mean())
+ else:
+ print(f"{n} - doesn't have gradient computed")
+
+ plt.plot(ave_grads, alpha=0.3, color="b")
+ plt.hlines(0, 0, len(ave_grads) + 1, linewidth=1, color="k")
+ plt.xticks(range(0, len(ave_grads), 1), layers, rotation="vertical")
+ plt.xlim(xmin=0, xmax=len(ave_grads))
+ plt.xlabel("Layers")
+ plt.ylabel("average gradient")
+ plt.title("Gradient flow")
+ plt.grid(True)
+
+
+def plot_grad_flow_v2(named_parameters):
+ """Plots the gradients flowing through different layers in the net during training.
+ Can be used for checking for possible gradient vanishing / exploding problems.
+
+ Usage: Plug this function in Trainer class after loss.backwards() as
+ "plot_grad_flow(self.model.named_parameters())" to visualize the gradient flow"""
+ ave_grads = []
+ max_grads = []
+ layers = []
+ for n, p in named_parameters:
+ if (p.requires_grad) and ("bias" not in n):
+ layers.append(n)
+ if p.grad:
+ ave_grads.append(p.grad.abs().mean())
+ max_grads.append(p.grad.abs().max())
+ else:
+ print(f"{n} - doesn't have gradient computed")
+ plt.bar(np.arange(len(max_grads)), max_grads, alpha=0.1, lw=1, color="c")
+ plt.bar(np.arange(len(max_grads)), ave_grads, alpha=0.1, lw=1, color="b")
+ plt.hlines(0, 0, len(ave_grads) + 1, lw=2, color="k")
+ plt.xticks(range(0, len(ave_grads), 1), layers, rotation="vertical")
+ plt.xlim(left=0, right=len(ave_grads))
+ plt.ylim(bottom=-0.001, top=0.02) # zoom in on the lower gradient regions
+ plt.xlabel("Layers")
+ plt.ylabel("average gradient")
+ plt.title("Gradient flow")
+ plt.grid(True)
+ plt.legend(
+ [
+ Line2D([0], [0], color="c", lw=4),
+ Line2D([0], [0], color="b", lw=4),
+ Line2D([0], [0], color="k", lw=4),
+ ],
+ ["max-gradient", "mean-gradient", "zero-gradient"],
+ )
diff --git a/models/Mask3D/build/lib/mask3d/utils/kfold.py b/models/Mask3D/build/lib/mask3d/utils/kfold.py
new file mode 100644
index 0000000000000000000000000000000000000000..5bfeba130c890eec35530adeb23f1362041f7cdc
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/utils/kfold.py
@@ -0,0 +1,89 @@
+""" Author: https://github.com/yk-szk/stratified_group_kfold """
+import random
+import numpy as np
+
+
+class StratifiedGroupKFold:
+ """
+ Stratified Group K-fold with sklearn.model_selection.KFold compabitility.
+
+ Split dataset into k folds with balanced label distribution (stratified) and non-overlapping group.
+
+ Args:
+ n_splits (int): # of splits
+ shuffle (bool): Shuffle
+ seed (int): Seed value for random number generator
+ """
+
+ def __init__(self, n_splits, shuffle=True, random_state=None):
+ self.n_splits = n_splits
+ self.shuffle = shuffle
+ self.seed = random_state
+
+ def split(self, X, labels, groups):
+ assert len(X) == len(labels) == len(groups), "Invalid input length"
+ assert (
+ len(set(groups)) >= self.n_splits
+ ), "The number of groups needs to be larger than n_splits"
+
+ def encode(v):
+ s = set(v)
+ d = {l: i for i, l in enumerate(s)}
+ return [d[e] for e in v]
+
+ labels, groups = encode(labels), encode(groups)
+ num_labels, num_groups = max(labels) + 1, max(groups) + 1
+ label_counts_per_group = np.zeros((num_groups, num_labels), dtype=int)
+ global_label_dist = np.bincount(labels)
+ for label, g in zip(labels, groups):
+ label_counts_per_group[g][label] += 1
+
+ label_counts_per_fold = np.zeros(
+ (self.n_splits, num_labels), dtype=int
+ )
+ groups_per_fold = [set() for _ in range(self.n_splits)]
+
+ def eval_label_counts_per_fold(y_counts, fold):
+ fold += y_counts
+ std_per_label = (
+ np.std(label_counts_per_fold, axis=0) / global_label_dist
+ )
+ fold -= y_counts
+ return np.mean(std_per_label)
+
+ groups_and_label_counts = list(enumerate(label_counts_per_group))
+ if self.shuffle:
+ rng = random.Random(self.seed)
+ mean_std = np.mean(np.std(label_counts_per_group, axis=1))
+ groups_and_label_counts.sort(
+ key=lambda g_counts: -np.std(g_counts[1])
+ + rng.gauss(0, mean_std)
+ ) # add rng.gauss to increase the randomness
+ else:
+ groups_and_label_counts.sort(
+ key=lambda g_counts: -np.std(g_counts[1])
+ )
+
+ for g, label_counts in groups_and_label_counts:
+ evals = [
+ eval_label_counts_per_fold(
+ label_counts, label_counts_per_fold[i]
+ )
+ for i in range(self.n_splits)
+ ]
+ best_fold = np.argmin(evals)
+ label_counts_per_fold[best_fold] += label_counts
+ groups_per_fold[best_fold].add(g)
+
+ all_groups = set(groups)
+ for test_groups in groups_per_fold:
+ train_groups = all_groups - test_groups
+
+ train_indices = [
+ i for i, g in enumerate(groups) if g in train_groups
+ ]
+ test_indices = [
+ i for i, g in enumerate(groups) if g in test_groups
+ ]
+
+ yield train_indices, test_indices
diff --git a/models/Mask3D/build/lib/mask3d/utils/pc_visualizations.py b/models/Mask3D/build/lib/mask3d/utils/pc_visualizations.py
new file mode 100644
index 0000000000000000000000000000000000000000..26937b9f293f9cc2b87cc67d3c8742c80f770d60
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/utils/pc_visualizations.py
@@ -0,0 +1,202 @@
+from io import BytesIO
+from imageio import imread
+
+import open3d as o3d
+from PIL import Image
+import numpy as np
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+from pandas import DataFrame
+import matplotlib
+import seaborn as sns
+import pyviz3d.visualizer as viz
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+
+
+def point_cloud_plolty(
+ coordinates,
+ label_color,
+ label_text,
+ prediction_color,
+ prediction_text,
+ normals,
+):
+ def draw_point_cloud(coords, colors=None, label_text=None):
+ marker = dict(size=1, opacity=0.8)
+ if colors is not None:
+ marker.update({"color": colors})
+ if (colors is None) and (label_text is not None):
+ marker.update({"color": label_text})
+ fig = go.Scatter3d(
+ x=coords[:, 0],
+ y=coords[:, 1],
+ z=coords[:, 2],
+ text=label_text,
+ mode="markers",
+ marker=marker,
+ )
+ return fig
+
+ fig = make_subplots(
+ rows=1,
+ cols=2,
+ specs=[[{"type": "scatter3d"}, {"type": "scatter3d"}]],
+ )
+ fig.add_trace(
+ draw_point_cloud(coordinates, prediction_color, prediction_text),
+ row=1,
+ col=1,
+ )
+ # adding image with prediction
+ fig.add_trace(
+ draw_point_cloud(coordinates, label_color, label_text), row=1, col=2
+ )
+ fig.show()
+ # data = fig.to_image(width=1080, height=720, format="png")
+ # image = Image.open(BytesIO(data))
+ # return image
+
+
+def point_cloud_pyviz3d(
+ name,
+ coordinates,
+ path,
+ color=None,
+ normals=None,
+ label_color=None,
+ prediction_color=None,
+ point_size=25,
+ voxel_size=0.01,
+):
+
+ # because of visualization
+ coordinates = coordinates * voxel_size
+ # First, we set up a visualizer
+ visualizer = viz.Visualizer()
+ if label_color is not None:
+ visualizer.add_points(
+ name=f"{name}_label",
+ positions=coordinates,
+ colors=label_color,
+ point_size=point_size,
+ visible=False,
+ )
+
+ if prediction_color is not None:
+ visualizer.add_points(
+ name=f"{name}_prediction",
+ positions=coordinates,
+ colors=prediction_color,
+ point_size=point_size,
+ visible=False,
+ )
+
+ visualizer.add_points(
+ name=name,
+ positions=coordinates,
+ colors=color,
+ normals=normals,
+ point_size=point_size,
+ visible=False,
+ )
+ # When we added everything we need to the visualizer, we save it.
+ visualizer.save(path, verbose=False)
+
+
+def point_cloud_open3d(coordinates):
+ points = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(coordinates))
+ o3d.visualization.draw_geometries([points])
+
+
+def _remap_model_output(output, labels):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(labels.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
+
+
+def save_visualization(
+ coordinates,
+ name="none",
+ color=None,
+ normals=None,
+ target=None,
+ prediction=None,
+ target_info=None,
+ path="./saved",
+ backend="pyviz3d",
+ voxel_size=0.05,
+ color_mean=[0.47793125906962, 0.4303257521323044, 0.3749598901421883],
+ color_std=[0.2834475483823543, 0.27566157565723015, 0.27018971370874995],
+):
+ target = _remap_model_output(target, target_info)
+ prediction = _remap_model_output(prediction, target_info)
+ coordinates = coordinates[:, :3] - coordinates[:, :3].mean(axis=0)
+ coordinates = coordinates * voxel_size
+ if color is not None:
+ color = (color * color_std + color_mean) * 255
+
+ target_color = np.zeros((len(target), 3))
+ target_text = np.full((len(target)), "empty")
+ prediction_color = np.zeros((len(prediction), 3))
+ prediction_text = np.full((len(prediction)), "empty")
+ if target_info is not None:
+ for k, v in target_info.items():
+ target_color[target == k] = v["color"]
+ target_text[target == k] = v["name"]
+ prediction_color[prediction == k] = v["color"]
+ prediction_text[prediction == k] = v["name"]
+ if backend == "pyviz3d":
+ point_cloud_pyviz3d(
+ name=name,
+ coordinates=coordinates,
+ path=path,
+ color=color,
+ normals=normals,
+ label_color=target_color,
+ prediction_color=prediction_color,
+ voxel_size=1,
+ )
+ elif backend == "plotly":
+ point_cloud_plolty(
+ coordinates=coordinates,
+ normals=normals,
+ label_color=target_color,
+ label_text=target_text,
+ prediction_color=prediction_color,
+ prediction_text=prediction_text,
+ )
+ elif backend == "open3d":
+ point_cloud_open3d(coordinates)
+ else:
+ print("No such backend")
+
+
+def draw_confsion_matrix(confusion_matrix, label_db):
+ index = [i for i in range(confusion_matrix.shape[0])]
+ index = _remap_model_output(index, label_db)
+ column_names = np.full((len(index)), "empty")
+ for k, v in label_db.items():
+ column_names[index == k] = v["name"]
+ df_cm = DataFrame(
+ confusion_matrix, index=column_names, columns=column_names
+ )
+ # pretty_plot_confusion_matrix(df_cm, fz=9)
+ sns.heatmap(
+ df_cm,
+ annot=True,
+ fmt="d",
+ linewidths=0.25,
+ annot_kws={"size": 5},
+ vmax=10000,
+ )
+ buf = BytesIO()
+ plt.savefig(buf, format="jpg")
+ plt.close()
+ buf.seek(0)
+ image = imread(buf, format="jpg")
+ buf.close()
+ return image
diff --git a/models/Mask3D/build/lib/mask3d/utils/point_cloud_utils.py b/models/Mask3D/build/lib/mask3d/utils/point_cloud_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d2b5ec875da78d299c23afa70531cb0df04e278
--- /dev/null
+++ b/models/Mask3D/build/lib/mask3d/utils/point_cloud_utils.py
@@ -0,0 +1,83 @@
+from pathlib import Path
+from typing import List, Optional, Tuple
+
+import numpy as np
+import open3d
+from plyfile import PlyData, PlyElement
+
+
+def load_ply(filepath):
+ with open(filepath, "rb") as f:
+ plydata = PlyData.read(f)
+ data = plydata.elements[0].data
+ coords = np.array([data["x"], data["y"], data["z"]], dtype=np.float32).T
+ feats = None
+ labels = None
+ if ({"red", "green", "blue"} - set(data.dtype.names)) == set():
+ feats = np.array(
+ [data["red"], data["green"], data["blue"]], dtype=np.uint8
+ ).T
+ if "label" in data.dtype.names:
+ labels = np.array(data["label"], dtype=np.uint32)
+ return coords, feats, labels
+
+
+def load_ply_with_normals(filepath):
+ mesh = open3d.io.read_triangle_mesh(str(filepath))
+ if not mesh.has_vertex_normals():
+ mesh.compute_vertex_normals()
+ vertices = np.asarray(mesh.vertices)
+ normals = np.asarray(mesh.vertex_normals)
+
+ coords, feats, labels = load_ply(filepath)
+ assert np.allclose(coords, vertices), "different coordinates"
+ feats = np.hstack((feats, normals))
+
+ return coords, feats, labels
+
+
+def load_obj_with_normals(filepath):
+ mesh = open3d.io.read_triangle_mesh(str(filepath))
+ if not mesh.has_vertex_normals():
+ mesh.compute_vertex_normals()
+ coords = np.asarray(mesh.vertices)
+ normals = np.asarray(mesh.vertex_normals)
+ colors = np.asarray(mesh.vertex_colors)
+ feats = np.hstack((colors, normals))
+
+ return coords, feats
+
+
+def write_point_cloud_in_ply(
+ filepath: Path,
+ coords: np.ndarray,
+ feats: Optional[np.ndarray] = None,
+ labels: Optional[np.ndarray] = None,
+ dtypes: Optional[List[Tuple[str, str]]] = [
+ ("x", "= (3, 8):
+ from collections.abc import MutableMapping
+else:
+ from collections import MutableMapping
+
+import torch
+from loguru import logger
+
+
+def flatten_dict(d, parent_key="", sep="_"):
+ """
+ https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys
+ """
+ items = []
+ for k, v in d.items():
+ new_key = parent_key + sep + k if parent_key else k
+ if isinstance(v, MutableMapping):
+ items.extend(flatten_dict(v, new_key, sep=sep).items())
+ else:
+ items.append((new_key, v))
+ return dict(items)
+
+
+def load_baseline_model(cfg, model):
+ # if it is Minkoski weights
+ cfg.model.in_channels = 3
+ cfg.model.config.conv1_kernel_size = 5
+ cfg.data.add_normals = False
+ cfg.data.train_dataset.color_mean_std = [(0.5, 0.5, 0.5), (1, 1, 1)]
+ cfg.data.validation_dataset.color_mean_std = [(0.5, 0.5, 0.5), (1, 1, 1)]
+ cfg.data.test_dataset.color_mean_std = [(0.5, 0.5, 0.5), (1, 1, 1)]
+ cfg.data.voxel_size = 0.02
+ model = model(cfg)
+ state_dict = torch.load(cfg.general.checkpoint)["state_dict"]
+ model.model.load_state_dict(state_dict)
+ return cfg, model
+
+
+def load_backbone_checkpoint_with_missing_or_exsessive_keys(cfg, model):
+ state_dict = torch.load(cfg.general.backbone_checkpoint)["state_dict"]
+ correct_dict = dict(model.state_dict())
+
+ # if parametrs not found in checkpoint they will be randomly initialized
+ for key in state_dict.keys():
+ if correct_dict.pop(f"model.backbone.{key}", None) is None:
+ logger.warning(
+ f"Key not found, it will be initialized randomly: {key}"
+ )
+
+ # if parametrs have different shape, it will randomly initialize
+ state_dict = torch.load(cfg.general.backbone_checkpoint)["state_dict"]
+ correct_dict = dict(model.state_dict())
+ for key in correct_dict.keys():
+ if key.replace("model.backbone.", "") not in state_dict:
+ logger.warning(f"{key} not in loaded checkpoint")
+ state_dict.update(
+ {key.replace("model.backbone.", ""): correct_dict[key]}
+ )
+ elif (
+ state_dict[key.replace("model.backbone.", "")].shape
+ != correct_dict[key].shape
+ ):
+ logger.warning(
+ f"incorrect shape {key}:{state_dict[key.replace('model.backbone.', '')].shape} vs {correct_dict[key].shape}"
+ )
+ state_dict.update({key: correct_dict[key]})
+
+ # if we have more keys just discard them
+ correct_dict = dict(model.state_dict())
+ new_state_dict = dict()
+ for key in state_dict.keys():
+ if f"model.backbone.{key}" in correct_dict.keys():
+ new_state_dict.update({f"model.backbone.{key}": state_dict[key]})
+ elif key in correct_dict.keys():
+ new_state_dict.update({key: correct_dict[key]})
+ else:
+ logger.warning(f"excessive key: {key}")
+ model.load_state_dict(new_state_dict)
+ return cfg, model
+
+
+def load_checkpoint_with_missing_or_exsessive_keys(cfg, model):
+ state_dict = torch.load(cfg.general.checkpoint)["state_dict"]
+ correct_dict = dict(model.state_dict())
+
+ # if parametrs not found in checkpoint they will be randomly initialized
+ for key in state_dict.keys():
+ if correct_dict.pop(key, None) is None:
+ logger.warning(
+ f"Key not found, it will be initialized randomly: {key}"
+ )
+
+ # if parametrs have different shape, it will randomly initialize
+ state_dict = torch.load(cfg.general.checkpoint)["state_dict"]
+ correct_dict = dict(model.state_dict())
+ for key in correct_dict.keys():
+ if key not in state_dict:
+ logger.warning(f"{key} not in loaded checkpoint")
+ state_dict.update({key: correct_dict[key]})
+ elif state_dict[key].shape != correct_dict[key].shape:
+ logger.warning(
+ f"incorrect shape {key}:{state_dict[key].shape} vs {correct_dict[key].shape}"
+ )
+ state_dict.update({key: correct_dict[key]})
+
+ # if we have more keys just discard them
+ correct_dict = dict(model.state_dict())
+ new_state_dict = dict()
+ for key in state_dict.keys():
+ if key in correct_dict.keys():
+ new_state_dict.update({key: state_dict[key]})
+ else:
+ logger.warning(f"excessive key: {key}")
+ model.load_state_dict(new_state_dict)
+ return cfg, model
+
+
+def freeze_until(net, param_name: str = None):
+ """
+ Freeze net until param_name
+ https://opendatascience.slack.com/archives/CGK4KQBHD/p1588373239292300?thread_ts=1588105223.275700&cid=CGK4KQBHD
+ Args:
+ net:
+ param_name:
+ Returns:
+ """
+ found_name = False
+ for name, params in net.named_parameters():
+ if name == param_name:
+ found_name = True
+ params.requires_grad = found_name
diff --git a/models/Mask3D/mask3d.egg-info/PKG-INFO b/models/Mask3D/mask3d.egg-info/PKG-INFO
new file mode 100644
index 0000000000000000000000000000000000000000..8bc09c1d9f7d373a6ae88b90feddb4097f838333
--- /dev/null
+++ b/models/Mask3D/mask3d.egg-info/PKG-INFO
@@ -0,0 +1,11 @@
+Metadata-Version: 2.1
+Name: mask3d
+Version: 0.1
+Summary: UNKNOWN
+Home-page: UNKNOWN
+License: UNKNOWN
+Platform: UNKNOWN
+License-File: LICENSE
+
+UNKNOWN
+
diff --git a/models/Mask3D/mask3d.egg-info/SOURCES.txt b/models/Mask3D/mask3d.egg-info/SOURCES.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d8664a91f3fa541efb77e4f4bb3dd0dde5aadf2d
--- /dev/null
+++ b/models/Mask3D/mask3d.egg-info/SOURCES.txt
@@ -0,0 +1,110 @@
+LICENSE
+MANIFEST.in
+README.md
+setup.py
+mask3d/__init__.py
+mask3d/main_instance_segmentation.py
+mask3d/predict.py
+mask3d/preprocess_arkitscenes.py
+mask3d.egg-info/PKG-INFO
+mask3d.egg-info/SOURCES.txt
+mask3d.egg-info/dependency_links.txt
+mask3d.egg-info/top_level.txt
+mask3d/benchmark/__init__.py
+mask3d/benchmark/evaluate_semantic_instance.py
+mask3d/benchmark/util.py
+mask3d/benchmark/util_3d.py
+mask3d/conf/__init__.py
+mask3d/conf/config_base_instance_segmentation.yaml
+mask3d/conf/augmentation/albumentations_aug.yaml
+mask3d/conf/augmentation/volumentations_aug.yaml
+mask3d/conf/callbacks/callbacks_instance_segmentation.yaml
+mask3d/conf/data/indoor.yaml
+mask3d/conf/data/outdoor.yaml
+mask3d/conf/data/collation_functions/voxelize_collate.yaml
+mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml
+mask3d/conf/data/data_loaders/simple_loader.yaml
+mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml
+mask3d/conf/data/datasets/matterport.yaml
+mask3d/conf/data/datasets/matterport_scannet.yaml
+mask3d/conf/data/datasets/rio.yaml
+mask3d/conf/data/datasets/s3dis.yaml
+mask3d/conf/data/datasets/scannet.yaml
+mask3d/conf/data/datasets/scannet200.yaml
+mask3d/conf/data/datasets/semantic_kitti.yaml
+mask3d/conf/data/datasets/stpls3d.yaml
+mask3d/conf/logging/base.yaml
+mask3d/conf/logging/full.yaml
+mask3d/conf/logging/minimal.yaml
+mask3d/conf/logging/offline.yaml
+mask3d/conf/loss/cross_entropy.yaml
+mask3d/conf/loss/set_criterion.yaml
+mask3d/conf/loss/set_criterion_custom_weights_1.yaml
+mask3d/conf/matcher/hungarian_matcher.yaml
+mask3d/conf/metrics/miou.yaml
+mask3d/conf/model/mask3d.yaml
+mask3d/conf/optimizer/adamw.yaml
+mask3d/conf/optimizer/adamw_lower.yaml
+mask3d/conf/scheduler/exponentiallr.yaml
+mask3d/conf/scheduler/lambdalr.yaml
+mask3d/conf/scheduler/onecyclelr.yaml
+mask3d/conf/trainer/trainer.yaml
+mask3d/conf/trainer/trainer600.yaml
+mask3d/datasets/__init__.py
+mask3d/datasets/outdoor_semseg.py
+mask3d/datasets/random_cuboid.py
+mask3d/datasets/semseg.py
+mask3d/datasets/utils.py
+mask3d/datasets/preprocessing/__init__.py
+mask3d/datasets/preprocessing/arkitscenes_preprocessing.py
+mask3d/datasets/preprocessing/base_preprocessing.py
+mask3d/datasets/preprocessing/s3dis_preprocessing.py
+mask3d/datasets/preprocessing/scannet_preprocessing.py
+mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py
+mask3d/datasets/preprocessing/stpls3d_preprocessing.py
+mask3d/datasets/scannet200/__init__.py
+mask3d/datasets/scannet200/scannet200_constants.py
+mask3d/datasets/scannet200/scannet200_splits.py
+mask3d/models/__init__.py
+mask3d/models/criterion.py
+mask3d/models/mask3d.py
+mask3d/models/matcher.py
+mask3d/models/misc.py
+mask3d/models/model.py
+mask3d/models/position_embedding.py
+mask3d/models/res16unet.py
+mask3d/models/resnet.py
+mask3d/models/resunet.py
+mask3d/models/wrapper.py
+mask3d/models/metrics/__init__.py
+mask3d/models/metrics/confusionmatrix.py
+mask3d/models/metrics/metrics.py
+mask3d/models/modules/3detr_helpers.py
+mask3d/models/modules/__init__.py
+mask3d/models/modules/common.py
+mask3d/models/modules/helpers_3detr.py
+mask3d/models/modules/resnet_block.py
+mask3d/models/modules/senet_block.py
+mask3d/trainer/__init__.py
+mask3d/trainer/trainer.py
+mask3d/utils/__init__.py
+mask3d/utils/gradflow_check.py
+mask3d/utils/kfold.py
+mask3d/utils/pc_visualizations.py
+mask3d/utils/point_cloud_utils.py
+mask3d/utils/utils.py
+mask3d/utils/pointops2/__init__.py
+mask3d/utils/pointops2/setup.py
+mask3d/utils/pointops2/functions/__init__.py
+mask3d/utils/pointops2/functions/pointops.py
+mask3d/utils/pointops2/functions/pointops2.py
+mask3d/utils/pointops2/functions/pointops_ablation.py
+mask3d/utils/pointops2/functions/test_attention_op_step1.py
+mask3d/utils/pointops2/functions/test_attention_op_step1_v2.py
+mask3d/utils/pointops2/functions/test_attention_op_step2.py
+mask3d/utils/pointops2/functions/test_relative_pos_encoding_op_step1.py
+mask3d/utils/pointops2/functions/test_relative_pos_encoding_op_step1_v2.py
+mask3d/utils/pointops2/functions/test_relative_pos_encoding_op_step1_v3.py
+mask3d/utils/pointops2/functions/test_relative_pos_encoding_op_step2.py
+mask3d/utils/pointops2/functions/test_relative_pos_encoding_op_step2_v2.py
+mask3d/utils/pointops2/src/__init__.py
\ No newline at end of file
diff --git a/models/Mask3D/mask3d.egg-info/dependency_links.txt b/models/Mask3D/mask3d.egg-info/dependency_links.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8b137891791fe96927ad78e64b0aad7bded08bdc
--- /dev/null
+++ b/models/Mask3D/mask3d.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/models/Mask3D/mask3d.egg-info/top_level.txt b/models/Mask3D/mask3d.egg-info/top_level.txt
new file mode 100644
index 0000000000000000000000000000000000000000..347620dbc6cab3f22ef5e880a7f4ff468f301c49
--- /dev/null
+++ b/models/Mask3D/mask3d.egg-info/top_level.txt
@@ -0,0 +1 @@
+mask3d
diff --git a/models/Mask3D/mask3d/__init__.py b/models/Mask3D/mask3d/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2b6e21d418d6ee195db3d2b8682476c8fb448cd
--- /dev/null
+++ b/models/Mask3D/mask3d/__init__.py
@@ -0,0 +1,276 @@
+import hydra
+import torch
+from torch_scatter import scatter_mean
+
+from mask3d.models.mask3d import Mask3D
+from mask3d.utils.utils import (
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+
+class InstanceSegmentation(torch.nn.Module):
+ def __init__(self, cfg):
+ super().__init__()
+ self.model = hydra.utils.instantiate(cfg.model)
+
+
+ def forward(self, x, raw_coordinates=None, point2segment=None):
+ return self.model(x, raw_coordinates=raw_coordinates, point2segment=point2segment)
+
+
+from omegaconf import OmegaConf, DictConfig
+import hydra
+from hydra.core.global_hydra import GlobalHydra
+from hydra.experimental import initialize, compose
+
+# imports for input loading
+import albumentations as A
+import MinkowskiEngine as ME
+import numpy as np
+import open3d as o3d
+
+# imports for output
+from mask3d.datasets.scannet200.scannet200_constants import (VALID_CLASS_IDS_20, VALID_CLASS_IDS_200, SCANNET_COLOR_MAP_20, SCANNET_COLOR_MAP_200)
+
+def get_model(checkpoint_path=None, dataset_name = "scannet200"):
+
+
+ # Initialize the directory with config files
+ with initialize(config_path="conf"):
+ # Compose a configuration
+ cfg = compose(config_name="config_base_instance_segmentation.yaml")
+
+ cfg.general.checkpoint = checkpoint_path
+
+ # would be nicd to avoid this hardcoding below
+ # dataset_name = checkpoint_path.split('/')[-1].split('_')[0]
+ if dataset_name == 'scannet200':
+ cfg.general.num_targets = 201
+ cfg.general.train_mode = False
+ cfg.general.eval_on_segments = True
+ cfg.general.topk_per_image = 300
+ cfg.general.use_dbscan = True
+ cfg.general.dbscan_eps = 0.95
+ cfg.general.export_threshold = 0.001
+
+ # # data
+ cfg.data.num_labels = 200
+ cfg.data.test_mode = "test"
+
+ # # model
+ cfg.model.num_queries = 150
+
+ if dataset_name == 'scannet':
+ cfg.general.num_targets = 19
+ cfg.general.train_mode = False
+ cfg.general.eval_on_segments = True
+ cfg.general.topk_per_image = 300
+ cfg.general.use_dbscan = True
+ cfg.general.dbscan_eps = 0.95
+ cfg.general.export_threshold = 0.001
+
+ # # data
+ cfg.data.num_labels = 20
+ cfg.data.test_mode = "test"
+
+ # # model
+ cfg.model.num_queries = 150
+
+ #TODO: this has to be fixed and discussed with Jonas
+ # cfg.model.scene_min = -3.
+ # cfg.model.scene_max = 3.
+
+ # # Initialize the Hydra context
+ # hydra.core.global_hydra.GlobalHydra.instance().clear()
+ # hydra.initialize(config_path="conf")
+
+ # Load the configuration
+ # cfg = hydra.compose(config_name="config_base_instance_segmentation.yaml")
+ model = InstanceSegmentation(cfg)
+
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ return model
+
+
+def load_mesh(pcl_file):
+
+ # load point cloud
+ input_mesh_path = pcl_file
+ mesh = o3d.io.read_triangle_mesh(input_mesh_path)
+ return mesh
+
+def load_ply(path_2_mesh):
+ pcd = o3d.io.read_point_cloud(path_2_mesh)
+ return pcd
+
+def prepare_data(pointcloud_file, device):
+ # normalization for point cloud features
+ color_mean = (0.47793125906962, 0.4303257521323044, 0.3749598901421883)
+ color_std = (0.2834475483823543, 0.27566157565723015, 0.27018971370874995)
+ normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+ datatype = None
+
+ if pointcloud_file.split('.')[-1] == 'ply':
+ try:
+ mesh = load_mesh(pointcloud_file)
+ points = np.asarray(mesh.vertices)
+ colors = np.asarray(mesh.vertex_colors)
+ colors = colors * 255.
+ datatype = "mesh"
+ except:
+ pcd = load_ply(pointcloud_file)
+ points = np.asarray(pcd.points)
+ colors = np.asarray(pcd.colors)
+ datatype = "point cloud"
+
+ if datatype is None:
+ print("DATA TYPE IS NOT SUPPORTED!")
+ exit()
+ segments = None
+ elif pointcloud_file.split('.')[-1] == 'npy':
+ points = np.load(pointcloud_file)
+ points, colors, normals, segments, labels = (
+ points[:, :3],
+ points[:, 3:6],
+ points[:, 6:9],
+ points[:, 9],
+ points[:, 10:12],
+ )
+ datatype = "mesh"
+
+ else:
+ print("FORMAT NOT SUPPORTED")
+ exit()
+ if datatype == "mesh":
+ pseudo_image = colors.astype(np.uint8)[np.newaxis, :, :]
+ colors = np.squeeze(normalize_color(image=pseudo_image)["image"])
+
+ coords = np.floor(points / 0.02)
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(
+ coordinates=coords,
+ features=colors,
+ return_index=True,
+ return_inverse=True,
+ )
+
+ sample_coordinates = coords[unique_map]
+ coordinates = [torch.from_numpy(sample_coordinates).int()]
+ sample_features = colors[unique_map]
+ features = [torch.from_numpy(sample_features).float()]
+
+ if segments is not None:
+ point2segment_full = segments
+ point2segment = segments[unique_map]
+ point2segment = [torch.from_numpy(point2segment).long()]
+ point2segment_full = [torch.from_numpy(point2segment_full).long()]
+
+ # Concatenate all lists
+ input_dict = {"coords": coordinates, "feats": features}
+ if len(point2segment) > 0:
+ input_dict["labels"] = point2segment
+ coordinates, _, point2segment = ME.utils.sparse_collate(**input_dict)
+ point2segment = point2segment.cuda()
+ else:
+ coordinates, _ = ME.utils.sparse_collate(**input_dict)
+ point2segment = None
+ point2segment_full = None
+ else:
+ point2segment = None
+ point2segment_full = None
+ coordinates, _ = ME.utils.sparse_collate(coords=coordinates, feats=features)
+
+ features = torch.cat(features, dim=0)
+ data = ME.SparseTensor(
+ coordinates=coordinates,
+ features=features,
+ device=device,
+ )
+ return data, points, colors, features, unique_map, inverse_map, point2segment, point2segment_full
+
+
+def map_output_to_pointcloud(outputs,
+ inverse_map,
+ point2segment,
+ point2segment_full):
+
+ # parse predictions
+ logits = outputs["pred_logits"]
+ logits = torch.functional.F.softmax(logits, dim=-1)[..., :-1]
+ masks = outputs["pred_masks"]
+ # reformat predictions
+ logits = logits[0]
+ masks = masks[0] if point2segment is None else masks[0][point2segment]
+
+ num_queries = len(logits)
+ scores_per_query, topk_indices = logits.flatten(0, 1).topk(
+ num_queries, sorted=True
+ )
+
+ topk_indices = topk_indices // 200
+ masks = masks[:, topk_indices]
+
+ result_pred_mask = (masks > 0).float()
+ heatmap = masks.float().sigmoid()
+
+ mask_scores_per_image = (heatmap * result_pred_mask).sum(0) / (
+ result_pred_mask.sum(0) + 1e-6
+ )
+ score = scores_per_query * mask_scores_per_image
+ result_pred_mask = get_full_res_mask(result_pred_mask, inverse_map, point2segment_full[0]) if point2segment_full is not None else result_pred_mask[inverse_map]
+ return (result_pred_mask, score)
+
+def get_full_res_mask(mask, inverse_map, point2segment_full):
+ mask = mask.detach().cpu()[inverse_map] # full res
+ mask = scatter_mean(mask, point2segment_full, dim=0) # full res segments
+ mask = (mask > 0.5).float()
+ mask = mask.detach().cpu()[point2segment_full.cpu()] # full res points
+ return mask
+
+def save_colorized_mesh(mesh, labels_mapped, output_file, colormap='scannet'):
+
+ # colorize mesh
+ colors = np.zeros((len(mesh.vertices), 3))
+ for li in np.unique(labels_mapped):
+ if colormap == 'scannet':
+ raise ValueError('Not implemented yet')
+ elif colormap == 'scannet200':
+ v_li = VALID_CLASS_IDS_200[int(li)]
+ colors[(labels_mapped == li)[:, 0], :] = SCANNET_COLOR_MAP_200[v_li]
+ else:
+ raise ValueError('Unknown colormap - not supported')
+
+ colors = colors / 255.
+ mesh.vertex_colors = o3d.utility.Vector3dVector(colors)
+ o3d.io.write_triangle_mesh(output_file, mesh)
+
+if __name__ == '__main__':
+
+ model = get_model('checkpoints/scannet200/scannet200_benchmark.ckpt')
+ model.eval()
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ model.to(device)
+
+ # load input data
+ pointcloud_file = 'data/pcl.ply'
+ mesh = load_mesh(pointcloud_file)
+
+ # prepare data
+ data, points, colors, features, unique_map, inverse_map = prepare_data(mesh, device)
+
+ # run model
+ with torch.no_grad():
+ outputs = model(data, raw_coordinates=features)
+
+ # map output to point cloud
+ labels = map_output_to_pointcloud(mesh, outputs, inverse_map)
+
+ # save colorized mesh
+ save_colorized_mesh(mesh, labels, 'data/pcl_labelled.ply', colormap='scannet200')
+
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/benchmark/__init__.py b/models/Mask3D/mask3d/benchmark/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/benchmark/evaluate_semantic_instance.py b/models/Mask3D/mask3d/benchmark/evaluate_semantic_instance.py
new file mode 100644
index 0000000000000000000000000000000000000000..242cb87a09b5c69a0d967217a2cd97706197a63d
--- /dev/null
+++ b/models/Mask3D/mask3d/benchmark/evaluate_semantic_instance.py
@@ -0,0 +1,1141 @@
+# Evaluates semantic instance task
+# Adapted from the CityScapes evaluation: https://github.com/mcordts/cityscapesScripts/tree/master/cityscapesscripts/evaluation
+# Input:
+# - path to .txt prediction files
+# - path to .txt ground truth files
+# - output file to write results to
+# Each .txt prediction file look like:
+# [(pred0) rel. path to pred. mask over verts as .txt] [(pred0) label id] [(pred0) confidence]
+# [(pred1) rel. path to pred. mask over verts as .txt] [(pred1) label id] [(pred1) confidence]
+# [(pred2) rel. path to pred. mask over verts as .txt] [(pred2) label id] [(pred2) confidence]
+# ...
+#
+# NOTE: The prediction files must live in the root of the given prediction path.
+# Predicted mask .txt files must live in a subfolder.
+# Additionally, filenames must not contain spaces.
+# The relative paths to predicted masks must contain one integer per line,
+# where each line corresponds to vertices in the *_vh_clean_2.ply (in that order).
+# Non-zero integers indicate part of the predicted instance.
+# The label ids specify the class of the corresponding mask.
+# Confidence is a float confidence score of the mask.
+#
+# Note that only the valid classes are used for evaluation,
+# i.e., any ground truth label not in the valid label set
+# is ignored in the evaluation.
+#
+# example usage: evaluate_semantic_instance.py --scan_path [path to scan data] --output_file [output file]
+
+# python imports
+import math
+import os, sys, argparse
+import inspect
+from copy import deepcopy
+from uuid import uuid4
+
+import torch
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+
+from scipy import stats
+
+# currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
+# parentdir = os.path.dirname(currentdir)
+# sys.path.insert(0,parentdir)
+import benchmark.util as util
+import benchmark.util_3d as util_3d
+
+# parser = argparse.ArgumentParser()
+# parser.add_argument('--gt_path', default='', help='path to directory of gt .txt files')
+# parser.add_argument('--output_file', default='', help='output file [default: ./semantic_instance_evaluation.txt]')
+# opt = parser.parse_args()
+
+# if opt.output_file == '':
+# opt.output_file = os.path.join(os.getcwd(), 'semantic_instance_evaluation.txt')
+
+
+# ---------- Label info ---------- #
+CLASS_LABELS = [
+ "cabinet",
+ "bed",
+ "chair",
+ "sofa",
+ "table",
+ "door",
+ "window",
+ "bookshelf",
+ "picture",
+ "counter",
+ "desk",
+ "curtain",
+ "refrigerator",
+ "shower curtain",
+ "toilet",
+ "sink",
+ "bathtub",
+ "otherfurniture",
+]
+VALID_CLASS_IDS = np.array(
+ [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
+)
+ID_TO_LABEL = {}
+LABEL_TO_ID = {}
+for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+# ---------- Evaluation params ---------- #
+# overlaps for evaluation
+opt = {}
+opt["overlaps"] = np.append(np.arange(0.5, 0.95, 0.05), 0.25)
+# minimum region size for evaluation [verts]
+opt["min_region_sizes"] = np.array([100]) # 100 for s3dis, scannet
+# distance thresholds [m]
+opt["distance_threshes"] = np.array([float("inf")])
+# distance confidences
+opt["distance_confs"] = np.array([-float("inf")])
+
+
+def evaluate_matches(matches):
+ overlaps = opt["overlaps"]
+ min_region_sizes = [opt["min_region_sizes"][0]]
+ dist_threshes = [opt["distance_threshes"][0]]
+ dist_confs = [opt["distance_confs"][0]]
+
+ # results: class x overlap
+ ap = np.zeros(
+ (len(dist_threshes), len(CLASS_LABELS), len(overlaps)), float
+ )
+ for di, (min_region_size, distance_thresh, distance_conf) in enumerate(
+ zip(min_region_sizes, dist_threshes, dist_confs)
+ ):
+ for oi, overlap_th in enumerate(overlaps):
+ pred_visited = {}
+ for m in matches:
+ for p in matches[m]["pred"]:
+ for label_name in CLASS_LABELS:
+ for p in matches[m]["pred"][label_name]:
+ if "uuid" in p:
+ pred_visited[p["uuid"]] = False
+ for li, label_name in enumerate(CLASS_LABELS):
+ y_true = np.empty(0)
+ y_score = np.empty(0)
+ hard_false_negatives = 0
+ has_gt = False
+ has_pred = False
+ for m in matches:
+ pred_instances = matches[m]["pred"][label_name]
+ gt_instances = matches[m]["gt"][label_name]
+ # filter groups in ground truth
+ gt_instances = [
+ gt
+ for gt in gt_instances
+ if gt["instance_id"] >= 1000
+ and gt["vert_count"] >= min_region_size
+ and gt["med_dist"] <= distance_thresh
+ and gt["dist_conf"] >= distance_conf
+ ]
+ if gt_instances:
+ has_gt = True
+ if pred_instances:
+ has_pred = True
+
+ cur_true = np.ones(len(gt_instances))
+ cur_score = np.ones(len(gt_instances)) * (-float("inf"))
+ cur_match = np.zeros(len(gt_instances), dtype=bool)
+ # collect matches
+ for (gti, gt) in enumerate(gt_instances):
+ found_match = False
+ num_pred = len(gt["matched_pred"])
+ for pred in gt["matched_pred"]:
+ # greedy assignments
+ if pred_visited[pred["uuid"]]:
+ continue
+ overlap = float(pred["intersection"]) / (
+ gt["vert_count"]
+ + pred["vert_count"]
+ - pred["intersection"]
+ )
+ if overlap > overlap_th:
+ confidence = pred["confidence"]
+ # if already have a prediction for this gt,
+ # the prediction with the lower score is automatically a false positive
+ if cur_match[gti]:
+ max_score = max(cur_score[gti], confidence)
+ min_score = min(cur_score[gti], confidence)
+ cur_score[gti] = max_score
+ # append false positive
+ cur_true = np.append(cur_true, 0)
+ cur_score = np.append(cur_score, min_score)
+ cur_match = np.append(cur_match, True)
+ # otherwise set score
+ else:
+ found_match = True
+ cur_match[gti] = True
+ cur_score[gti] = confidence
+ pred_visited[pred["uuid"]] = True
+ if not found_match:
+ hard_false_negatives += 1
+ # remove non-matched ground truth instances
+ cur_true = cur_true[cur_match == True]
+ cur_score = cur_score[cur_match == True]
+
+ # collect non-matched predictions as false positive
+ for pred in pred_instances:
+ found_gt = False
+ for gt in pred["matched_gt"]:
+ overlap = float(gt["intersection"]) / (
+ gt["vert_count"]
+ + pred["vert_count"]
+ - gt["intersection"]
+ )
+ if overlap > overlap_th:
+ found_gt = True
+ break
+ if not found_gt:
+ num_ignore = pred["void_intersection"]
+ for gt in pred["matched_gt"]:
+ # group?
+ if gt["instance_id"] < 1000:
+ num_ignore += gt["intersection"]
+ # small ground truth instances
+ if (
+ gt["vert_count"] < min_region_size
+ or gt["med_dist"] > distance_thresh
+ or gt["dist_conf"] < distance_conf
+ ):
+ num_ignore += gt["intersection"]
+ proportion_ignore = (
+ float(num_ignore) / pred["vert_count"]
+ )
+ # if not ignored append false positive
+ if proportion_ignore <= overlap_th:
+ cur_true = np.append(cur_true, 0)
+ confidence = pred["confidence"]
+ cur_score = np.append(cur_score, confidence)
+
+ # append to overall results
+ y_true = np.append(y_true, cur_true)
+ y_score = np.append(y_score, cur_score)
+
+ # compute average precision
+ if has_gt and has_pred:
+ # compute precision recall curve first
+
+ # sorting and cumsum
+ score_arg_sort = np.argsort(y_score)
+ y_score_sorted = y_score[score_arg_sort]
+ y_true_sorted = y_true[score_arg_sort]
+ y_true_sorted_cumsum = np.cumsum(y_true_sorted)
+
+ # unique thresholds
+ (thresholds, unique_indices) = np.unique(
+ y_score_sorted, return_index=True
+ )
+ num_prec_recall = len(unique_indices) + 1
+
+ # prepare precision recall
+ num_examples = len(y_score_sorted)
+ # https://github.com/ScanNet/ScanNet/pull/26
+ # all predictions are non-matched but also all of them are ignored and not counted as FP
+ # y_true_sorted_cumsum is empty
+ # num_true_examples = y_true_sorted_cumsum[-1]
+ num_true_examples = (
+ y_true_sorted_cumsum[-1]
+ if len(y_true_sorted_cumsum) > 0
+ else 0
+ )
+ precision = np.zeros(num_prec_recall)
+ recall = np.zeros(num_prec_recall)
+
+ # deal with the first point
+ y_true_sorted_cumsum = np.append(y_true_sorted_cumsum, 0)
+ # deal with remaining
+ for idx_res, idx_scores in enumerate(unique_indices):
+ cumsum = y_true_sorted_cumsum[idx_scores - 1]
+ tp = num_true_examples - cumsum
+ fp = num_examples - idx_scores - tp
+ fn = cumsum + hard_false_negatives
+ p = float(tp) / (tp + fp)
+ r = float(tp) / (tp + fn)
+ precision[idx_res] = p
+ recall[idx_res] = r
+
+ # first point in curve is artificial
+ precision[-1] = 1.0
+ recall[-1] = 0.0
+
+ # compute average of precision-recall curve
+ recall_for_conv = np.copy(recall)
+ recall_for_conv = np.append(
+ recall_for_conv[0], recall_for_conv
+ )
+ recall_for_conv = np.append(recall_for_conv, 0.0)
+
+ stepWidths = np.convolve(
+ recall_for_conv, [-0.5, 0, 0.5], "valid"
+ )
+ # integrate is now simply a dot product
+ ap_current = np.dot(precision, stepWidths)
+
+ elif has_gt:
+ ap_current = 0.0
+ else:
+ ap_current = float("nan")
+ ap[di, li, oi] = ap_current
+ return ap
+
+
+def compute_averages(aps):
+ d_inf = 0
+ o50 = np.where(np.isclose(opt["overlaps"], 0.5))
+ o25 = np.where(np.isclose(opt["overlaps"], 0.25))
+ oAllBut25 = np.where(np.logical_not(np.isclose(opt["overlaps"], 0.25)))
+ avg_dict = {}
+ # avg_dict['all_ap'] = np.nanmean(aps[ d_inf,:,: ])
+ avg_dict["all_ap"] = np.nanmean(aps[d_inf, :, oAllBut25])
+ avg_dict["all_ap_50%"] = np.nanmean(aps[d_inf, :, o50])
+ avg_dict["all_ap_25%"] = np.nanmean(aps[d_inf, :, o25])
+ avg_dict["classes"] = {}
+ for (li, label_name) in enumerate(CLASS_LABELS):
+ avg_dict["classes"][label_name] = {}
+ # avg_dict["classes"][label_name]["ap"] = np.average(aps[ d_inf,li, :])
+ avg_dict["classes"][label_name]["ap"] = np.average(
+ aps[d_inf, li, oAllBut25]
+ )
+ avg_dict["classes"][label_name]["ap50%"] = np.average(
+ aps[d_inf, li, o50]
+ )
+ avg_dict["classes"][label_name]["ap25%"] = np.average(
+ aps[d_inf, li, o25]
+ )
+ return avg_dict
+
+
+def make_pred_info(pred: dict):
+ # pred = {'pred_scores' = 100, 'pred_classes' = 100 'pred_masks' = Nx100}
+ pred_info = {}
+ assert (
+ pred["pred_classes"].shape[0]
+ == pred["pred_scores"].shape[0]
+ == pred["pred_masks"].shape[1]
+ )
+ for i in range(len(pred["pred_classes"])):
+ info = {}
+ info["label_id"] = pred["pred_classes"][i]
+ info["conf"] = pred["pred_scores"][i]
+ info["mask"] = pred["pred_masks"][:, i]
+ pred_info[uuid4()] = info # we later need to identify these objects
+ return pred_info
+
+
+def assign_instances_for_scan(pred: dict, gt_file: str):
+ pred_info = make_pred_info(pred)
+ try:
+ gt_ids = util_3d.load_ids(gt_file)
+ except Exception as e:
+ util.print_error("unable to load " + gt_file + ": " + str(e))
+
+ # get gt instances
+ gt_instances = util_3d.get_instances(
+ gt_ids, VALID_CLASS_IDS, CLASS_LABELS, ID_TO_LABEL
+ )
+ # associate
+ gt2pred = deepcopy(gt_instances)
+ for label in gt2pred:
+ for gt in gt2pred[label]:
+ gt["matched_pred"] = []
+ pred2gt = {}
+ for label in CLASS_LABELS:
+ pred2gt[label] = []
+ num_pred_instances = 0
+ # mask of void labels in the groundtruth
+ bool_void = np.logical_not(np.in1d(gt_ids // 1000, VALID_CLASS_IDS))
+ # go thru all prediction masks
+ for uuid in pred_info:
+ label_id = int(pred_info[uuid]["label_id"])
+ conf = pred_info[uuid]["conf"]
+ if not label_id in ID_TO_LABEL:
+ continue
+ label_name = ID_TO_LABEL[label_id]
+ # read the mask
+ pred_mask = pred_info[uuid]["mask"]
+ assert len(pred_mask) == len(gt_ids)
+ # convert to binary
+ pred_mask = np.not_equal(pred_mask, 0)
+ num = np.count_nonzero(pred_mask)
+ if num < opt["min_region_sizes"][0]:
+ continue # skip if empty
+
+ pred_instance = {}
+ pred_instance["uuid"] = uuid
+ pred_instance["pred_id"] = num_pred_instances
+ pred_instance["label_id"] = label_id
+ pred_instance["vert_count"] = num
+ pred_instance["confidence"] = conf
+ pred_instance["void_intersection"] = np.count_nonzero(
+ np.logical_and(bool_void, pred_mask)
+ )
+
+ # matched gt instances
+ matched_gt = []
+ # go thru all gt instances with matching label
+ for (gt_num, gt_inst) in enumerate(gt2pred[label_name]):
+ intersection = np.count_nonzero(
+ np.logical_and(gt_ids == gt_inst["instance_id"], pred_mask)
+ )
+ if intersection > 0:
+ gt_copy = gt_inst.copy()
+ pred_copy = pred_instance.copy()
+ gt_copy["intersection"] = intersection
+ pred_copy["intersection"] = intersection
+ matched_gt.append(gt_copy)
+ gt2pred[label_name][gt_num]["matched_pred"].append(pred_copy)
+ pred_instance["matched_gt"] = matched_gt
+ num_pred_instances += 1
+ pred2gt[label_name].append(pred_instance)
+
+ return gt2pred, pred2gt
+
+
+def print_results(avgs):
+ sep = ""
+ col1 = ":"
+ lineLen = 64
+
+ print("")
+ print("#" * lineLen)
+ line = ""
+ line += "{:<15}".format("what") + sep + col1
+ line += "{:>15}".format("AP") + sep
+ line += "{:>15}".format("AP_50%") + sep
+ line += "{:>15}".format("AP_25%") + sep
+ print(line)
+ print("#" * lineLen)
+
+ for (li, label_name) in enumerate(CLASS_LABELS):
+ ap_avg = avgs["classes"][label_name]["ap"]
+ ap_50o = avgs["classes"][label_name]["ap50%"]
+ ap_25o = avgs["classes"][label_name]["ap25%"]
+ line = "{:<15}".format(label_name) + sep + col1
+ line += sep + "{:>15.3f}".format(ap_avg) + sep
+ line += sep + "{:>15.3f}".format(ap_50o) + sep
+ line += sep + "{:>15.3f}".format(ap_25o) + sep
+ print(line)
+
+ all_ap_avg = avgs["all_ap"]
+ all_ap_50o = avgs["all_ap_50%"]
+ all_ap_25o = avgs["all_ap_25%"]
+
+ print("-" * lineLen)
+ line = "{:<15}".format("average") + sep + col1
+ line += "{:>15.3f}".format(all_ap_avg) + sep
+ line += "{:>15.3f}".format(all_ap_50o) + sep
+ line += "{:>15.3f}".format(all_ap_25o) + sep
+ print(line)
+ print("")
+
+
+def write_result_file(avgs, filename):
+ _SPLITTER = ","
+ with open(filename, "w") as f:
+ f.write(
+ _SPLITTER.join(["class", "class id", "ap", "ap50", "ap25"]) + "\n"
+ )
+ for i in range(len(VALID_CLASS_IDS)):
+ class_name = CLASS_LABELS[i]
+ class_id = VALID_CLASS_IDS[i]
+ ap = avgs["classes"][class_name]["ap"]
+ ap50 = avgs["classes"][class_name]["ap50%"]
+ ap25 = avgs["classes"][class_name]["ap25%"]
+ f.write(
+ _SPLITTER.join(
+ [str(x) for x in [class_name, class_id, ap, ap50, ap25]]
+ )
+ + "\n"
+ )
+
+
+def evaluate(
+ preds: dict, gt_path: str, output_file: str, dataset: str = "scannet"
+):
+ global CLASS_LABELS
+ global VALID_CLASS_IDS
+ global ID_TO_LABEL
+ global LABEL_TO_ID
+ global opt
+
+ if dataset == "stpls3d":
+ # global CLASS_LABELS
+ # global VALID_CLASS_IDS
+ # global ID_TO_LABEL
+ # global LABEL_TO_ID
+
+ opt["min_region_sizes"] = np.array([10])
+
+ CLASS_LABELS = [
+ "Build",
+ "LowVeg",
+ "MediumVeg",
+ "HighVeg",
+ "Vehicle",
+ "Truck",
+ "Aircraft",
+ "MilitaryVeh",
+ "Bike",
+ "Motorcycle",
+ "LightPole",
+ "StreetSign",
+ "Clutter",
+ "Fence",
+ ]
+ VALID_CLASS_IDS = np.array(
+ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+ )
+
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ if dataset == "s3dis":
+ # global CLASS_LABELS
+ # global VALID_CLASS_IDS
+ # global ID_TO_LABEL
+ # global LABEL_TO_ID
+
+ CLASS_LABELS = [
+ "ceiling",
+ "floor",
+ "wall",
+ "beam",
+ "column",
+ "window",
+ "door",
+ "table",
+ "chair",
+ "sofa",
+ "bookcase",
+ "board",
+ "clutter",
+ ]
+ VALID_CLASS_IDS = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ if dataset == "scannet200":
+ CLASS_LABELS = (
+ "chair",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "bicycle",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "guitar case",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "structure",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "storage organizer",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "luggage",
+ "mattress",
+ )
+
+ VALID_CLASS_IDS = np.array(
+ (
+ 2,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 121,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 221,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 286,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 331,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 399,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 572,
+ 581,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1174,
+ 1175,
+ 1176,
+ 1178,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1183,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1190,
+ 1191,
+ )
+ )
+
+ ID_TO_LABEL = {}
+ LABEL_TO_ID = {}
+ for i in range(len(VALID_CLASS_IDS)):
+ LABEL_TO_ID[CLASS_LABELS[i]] = VALID_CLASS_IDS[i]
+ ID_TO_LABEL[VALID_CLASS_IDS[i]] = CLASS_LABELS[i]
+
+ total_true = 0
+ total_seen = 0
+ NUM_CLASSES = len(VALID_CLASS_IDS)
+
+ true_positive_classes = np.zeros(NUM_CLASSES)
+ positive_classes = np.zeros(NUM_CLASSES)
+ gt_classes = np.zeros(NUM_CLASSES)
+
+ # precision & recall
+ total_gt_ins = np.zeros(NUM_CLASSES)
+ at = 0.5
+ tpsins = [[] for _ in range(NUM_CLASSES)]
+ fpsins = [[] for _ in range(NUM_CLASSES)]
+ # mucov and mwcov
+ all_mean_cov = [[] for _ in range(NUM_CLASSES)]
+ all_mean_weighted_cov = [[] for _ in range(NUM_CLASSES)]
+
+ print("evaluating", len(preds), "scans...")
+ matches = {}
+ for i, (k, v) in enumerate(preds.items()):
+ gt_file = os.path.join(gt_path, k + ".txt")
+ if not os.path.isfile(gt_file):
+ util.print_error(
+ "Scan {} does not match any gt file".format(k), user_fault=True
+ )
+
+ if dataset == "s3dis":
+ gt_ids = util_3d.load_ids(gt_file)
+ gt_sem = (gt_ids // 1000) - 1
+ gt_ins = gt_ids - (gt_ids // 1000) * 1000
+
+ # pred_sem = v['pred_classes'] - 1
+ pred_sem = np.zeros(v["pred_masks"].shape[0], dtype=np.int)
+ # TODO CONTINUE HERE!!!!!!!!!!!!!
+ pred_ins = np.zeros(v["pred_masks"].shape[0], dtype=np.int)
+
+ for inst_id in reversed(range(v["pred_masks"].shape[1])):
+ point_ids = np.argwhere(v["pred_masks"][:, inst_id] == 1.0)[
+ :, 0
+ ]
+ pred_ins[point_ids] = inst_id + 1
+ pred_sem[point_ids] = v["pred_classes"][inst_id] - 1
+
+ # semantic acc
+ total_true += np.sum(pred_sem == gt_sem)
+ total_seen += pred_sem.shape[0]
+
+ # TODO PARALLELIZ THIS!!!!!!!
+ # pn semantic mIoU
+ """
+ for j in range(gt_sem.shape[0]):
+ gt_l = int(gt_sem[j])
+ pred_l = int(pred_sem[j])
+ gt_classes[gt_l] += 1
+ positive_classes[pred_l] += 1
+ true_positive_classes[gt_l] += int(gt_l == pred_l)
+ """
+
+ uniq, counts = np.unique(pred_sem, return_counts=True)
+ positive_classes[uniq] += counts
+
+ uniq, counts = np.unique(gt_sem, return_counts=True)
+ gt_classes[uniq] += counts
+
+ uniq, counts = np.unique(
+ gt_sem[pred_sem == gt_sem], return_counts=True
+ )
+ true_positive_classes[uniq] += counts
+
+ # instance
+ un = np.unique(pred_ins)
+ pts_in_pred = [[] for _ in range(NUM_CLASSES)]
+ for ig, g in enumerate(un): # each object in prediction
+ if g == -1:
+ continue
+ tmp = pred_ins == g
+ sem_seg_i = int(stats.mode(pred_sem[tmp])[0])
+ pts_in_pred[sem_seg_i] += [tmp]
+
+ un = np.unique(gt_ins)
+ pts_in_gt = [[] for _ in range(NUM_CLASSES)]
+ for ig, g in enumerate(un):
+ tmp = gt_ins == g
+ sem_seg_i = int(stats.mode(gt_sem[tmp])[0])
+ pts_in_gt[sem_seg_i] += [tmp]
+
+ # instance mucov & mwcov
+ for i_sem in range(NUM_CLASSES):
+ sum_cov = 0
+ mean_cov = 0
+ mean_weighted_cov = 0
+ num_gt_point = 0
+ for ig, ins_gt in enumerate(pts_in_gt[i_sem]):
+ ovmax = 0.0
+ num_ins_gt_point = np.sum(ins_gt)
+ num_gt_point += num_ins_gt_point
+ for ip, ins_pred in enumerate(pts_in_pred[i_sem]):
+ union = ins_pred | ins_gt
+ intersect = ins_pred & ins_gt
+ iou = float(np.sum(intersect)) / np.sum(union)
+
+ if iou > ovmax:
+ ovmax = iou
+ ipmax = ip
+
+ sum_cov += ovmax
+ mean_weighted_cov += ovmax * num_ins_gt_point
+
+ if len(pts_in_gt[i_sem]) != 0:
+ mean_cov = sum_cov / len(pts_in_gt[i_sem])
+ all_mean_cov[i_sem].append(mean_cov)
+
+ mean_weighted_cov /= num_gt_point
+ all_mean_weighted_cov[i_sem].append(mean_weighted_cov)
+
+ if dataset == "s3dis":
+ # instance precision & recall
+ for i_sem in range(NUM_CLASSES):
+ tp = [0.0] * len(pts_in_pred[i_sem])
+ fp = [0.0] * len(pts_in_pred[i_sem])
+ gtflag = np.zeros(len(pts_in_gt[i_sem]))
+ total_gt_ins[i_sem] += len(pts_in_gt[i_sem])
+
+ for ip, ins_pred in enumerate(pts_in_pred[i_sem]):
+ ovmax = -1.0
+
+ for ig, ins_gt in enumerate(pts_in_gt[i_sem]):
+ union = ins_pred | ins_gt
+ intersect = ins_pred & ins_gt
+ iou = float(np.sum(intersect)) / np.sum(union)
+
+ if iou > ovmax:
+ ovmax = iou
+ igmax = ig
+
+ if ovmax >= at:
+ tp[ip] = 1 # true
+ else:
+ fp[ip] = 1 # false positive
+
+ tpsins[i_sem] += tp
+ fpsins[i_sem] += fp
+
+ matches_key = os.path.abspath(gt_file)
+ # assign gt to predictions
+ gt2pred, pred2gt = assign_instances_for_scan(v, gt_file)
+ matches[matches_key] = {}
+ matches[matches_key]["gt"] = gt2pred
+ matches[matches_key]["pred"] = pred2gt
+ sys.stdout.write("\rscans processed: {}".format(i + 1))
+ sys.stdout.flush()
+ print("")
+ ap_scores = evaluate_matches(matches)
+ avgs = compute_averages(ap_scores)
+
+ # print
+ print_results(avgs)
+ write_result_file(avgs, output_file)
+
+ if dataset == "s3dis":
+ MUCov = np.zeros(NUM_CLASSES)
+ MWCov = np.zeros(NUM_CLASSES)
+ for i_sem in range(NUM_CLASSES):
+ MUCov[i_sem] = np.mean(all_mean_cov[i_sem])
+ MWCov[i_sem] = np.mean(all_mean_weighted_cov[i_sem])
+
+ precision = np.zeros(NUM_CLASSES)
+ recall = np.zeros(NUM_CLASSES)
+ for i_sem in range(NUM_CLASSES):
+ tp = np.asarray(tpsins[i_sem]).astype(np.float)
+ fp = np.asarray(fpsins[i_sem]).astype(np.float)
+ tp = np.sum(tp)
+ fp = np.sum(fp)
+ rec = tp / total_gt_ins[i_sem]
+ prec = tp / (tp + fp)
+
+ precision[i_sem] = prec
+ recall[i_sem] = rec
+
+ """
+ LOG_FOUT = open(os.path.join('results_a5.txt'), 'w')
+
+ def log_string(out_str):
+ LOG_FOUT.write(out_str + '\n')
+ LOG_FOUT.flush()
+ print(out_str)
+ """
+
+ return np.mean(precision), np.mean(recall)
+
+
+# TODO: remove this
+# import pandas as pd
+# def main():
+# print("!!! CLI is only for debugging purposes. use `evaluate()` instead.")
+# evaluate(pd.read_pickle("/globalwork/schult/saved_predictions.pkl"), opt.gt_path, opt.output_file)
+
+# if __name__ == '__main__':
+# main()
diff --git a/models/Mask3D/mask3d/benchmark/util.py b/models/Mask3D/mask3d/benchmark/util.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a4224cd4f785c8a5a7cde490cf0f9999e61dbe7
--- /dev/null
+++ b/models/Mask3D/mask3d/benchmark/util.py
@@ -0,0 +1,128 @@
+import os, sys
+import csv
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+try:
+ import imageio
+except:
+ print("Please install the module 'imageio' for image processing, e.g.")
+ print("pip install imageio")
+ sys.exit(-1)
+
+# print an error message and quit
+def print_error(message, user_fault=False):
+ sys.stderr.write("ERROR: " + str(message) + "\n")
+ if user_fault:
+ sys.exit(2)
+ sys.exit(-1)
+
+
+# if string s represents an int
+def represents_int(s):
+ try:
+ int(s)
+ return True
+ except ValueError:
+ return False
+
+
+def read_label_mapping(
+ filename, label_from="raw_category", label_to="nyu40id"
+):
+ assert os.path.isfile(filename)
+ mapping = dict()
+ with open(filename) as csvfile:
+ reader = csv.DictReader(csvfile, delimiter="\t")
+ for row in reader:
+ mapping[row[label_from]] = int(row[label_to])
+ # if ints convert
+ if represents_int(list(mapping.keys())[0]):
+ mapping = {int(k): v for k, v in mapping.items()}
+ return mapping
+
+
+# input: scene_types.txt or scene_types_all.txt
+def read_scene_types_mapping(filename, remove_spaces=True):
+ assert os.path.isfile(filename)
+ mapping = dict()
+ lines = open(filename).read().splitlines()
+ lines = [line.split("\t") for line in lines]
+ if remove_spaces:
+ mapping = {x[1].strip(): int(x[0]) for x in lines}
+ else:
+ mapping = {x[1]: int(x[0]) for x in lines}
+ return mapping
+
+
+# color by label
+def visualize_label_image(filename, image):
+ height = image.shape[0]
+ width = image.shape[1]
+ vis_image = np.zeros([height, width, 3], dtype=np.uint8)
+ color_palette = create_color_palette()
+ for idx, color in enumerate(color_palette):
+ vis_image[image == idx] = color
+ imageio.imwrite(filename, vis_image)
+
+
+# color by different instances (mod length of color palette)
+def visualize_instance_image(filename, image):
+ height = image.shape[0]
+ width = image.shape[1]
+ vis_image = np.zeros([height, width, 3], dtype=np.uint8)
+ color_palette = create_color_palette()
+ instances = np.unique(image)
+ for idx, inst in enumerate(instances):
+ vis_image[image == inst] = color_palette[inst % len(color_palette)]
+ imageio.imwrite(filename, vis_image)
+
+
+# color palette for nyu40 labels
+def create_color_palette():
+ return [
+ (0, 0, 0),
+ (174, 199, 232), # wall
+ (152, 223, 138), # floor
+ (31, 119, 180), # cabinet
+ (255, 187, 120), # bed
+ (188, 189, 34), # chair
+ (140, 86, 75), # sofa
+ (255, 152, 150), # table
+ (214, 39, 40), # door
+ (197, 176, 213), # window
+ (148, 103, 189), # bookshelf
+ (196, 156, 148), # picture
+ (23, 190, 207), # counter
+ (178, 76, 76),
+ (247, 182, 210), # desk
+ (66, 188, 102),
+ (219, 219, 141), # curtain
+ (140, 57, 197),
+ (202, 185, 52),
+ (51, 176, 203),
+ (200, 54, 131),
+ (92, 193, 61),
+ (78, 71, 183),
+ (172, 114, 82),
+ (255, 127, 14), # refrigerator
+ (91, 163, 138),
+ (153, 98, 156),
+ (140, 153, 101),
+ (158, 218, 229), # shower curtain
+ (100, 125, 154),
+ (178, 127, 135),
+ (120, 185, 128),
+ (146, 111, 194),
+ (44, 160, 44), # toilet
+ (112, 128, 144), # sink
+ (96, 207, 209),
+ (227, 119, 194), # bathtub
+ (213, 92, 176),
+ (94, 106, 211),
+ (82, 84, 163), # otherfurn
+ (100, 85, 144),
+ ]
diff --git a/models/Mask3D/mask3d/benchmark/util_3d.py b/models/Mask3D/mask3d/benchmark/util_3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..572064f3ca251563466ca6bfbe2c70dacdad205f
--- /dev/null
+++ b/models/Mask3D/mask3d/benchmark/util_3d.py
@@ -0,0 +1,177 @@
+import os, sys
+import json
+
+try:
+ import numpy as np
+except:
+ print("Failed to import numpy package.")
+ sys.exit(-1)
+
+try:
+ from plyfile import PlyData, PlyElement
+except:
+ print("Please install the module 'plyfile' for PLY i/o, e.g.")
+ print("pip install plyfile")
+ sys.exit(-1)
+
+import benchmark.util as util
+
+
+# matrix: 4x4 np array
+# points Nx3 np array
+def transform_points(matrix, points):
+ assert len(points.shape) == 2 and points.shape[1] == 3
+ num_points = points.shape[0]
+ p = np.concatenate([points, np.ones((num_points, 1))], axis=1)
+ p = np.matmul(matrix, np.transpose(p))
+ p = np.transpose(p)
+ p[:, :3] /= p[:, 3, None]
+ return p[:, :3]
+
+
+def export_ids(filename, ids):
+ with open(filename, "w") as f:
+ for id in ids:
+ f.write("%d\n" % id)
+
+
+def load_ids(filename):
+ ids = open(filename).read().splitlines()
+ ids = np.array(ids, dtype=np.int64)
+ return ids
+
+
+def read_mesh_vertices(filename):
+ assert os.path.isfile(filename)
+ with open(filename, "rb") as f:
+ plydata = PlyData.read(f)
+ num_verts = plydata["vertex"].count
+ vertices = np.zeros(shape=[num_verts, 3], dtype=np.float32)
+ vertices[:, 0] = plydata["vertex"].data["x"]
+ vertices[:, 1] = plydata["vertex"].data["y"]
+ vertices[:, 2] = plydata["vertex"].data["z"]
+ return vertices
+
+
+# export 3d instance labels for instance evaluation
+def export_instance_ids_for_eval(filename, label_ids, instance_ids):
+ assert label_ids.shape[0] == instance_ids.shape[0]
+ output_mask_path_relative = "pred_mask"
+ name = os.path.splitext(os.path.basename(filename))[0]
+ output_mask_path = os.path.join(
+ os.path.dirname(filename), output_mask_path_relative
+ )
+ if not os.path.isdir(output_mask_path):
+ os.mkdir(output_mask_path)
+ insts = np.unique(instance_ids)
+ zero_mask = np.zeros(shape=(instance_ids.shape[0]), dtype=np.int32)
+ with open(filename, "w") as f:
+ for idx, inst_id in enumerate(insts):
+ if inst_id == 0: # 0 -> no instance for this vertex
+ continue
+ output_mask_file = os.path.join(
+ output_mask_path_relative, name + "_" + str(idx) + ".txt"
+ )
+ loc = np.where(instance_ids == inst_id)
+ label_id = label_ids[loc[0][0]]
+ f.write("%s %d %f\n" % (output_mask_file, label_id, 1.0))
+ # write mask
+ mask = np.copy(zero_mask)
+ mask[loc[0]] = 1
+ export_ids(output_mask_file, mask)
+
+
+# ------------ Instance Utils ------------ #
+
+
+class Instance(object):
+ instance_id = 0
+ label_id = 0
+ vert_count = 0
+ med_dist = -1
+ dist_conf = 0.0
+
+ def __init__(self, mesh_vert_instances, instance_id):
+ if instance_id == -1:
+ return
+ self.instance_id = int(instance_id)
+ self.label_id = int(self.get_label_id(instance_id))
+ self.vert_count = int(
+ self.get_instance_verts(mesh_vert_instances, instance_id)
+ )
+
+ def get_label_id(self, instance_id):
+ return int(instance_id // 1000)
+
+ def get_instance_verts(self, mesh_vert_instances, instance_id):
+ return (mesh_vert_instances == instance_id).sum()
+
+ def to_json(self):
+ return json.dumps(
+ self, default=lambda o: o.__dict__, sort_keys=True, indent=4
+ )
+
+ def to_dict(self):
+ dict = {}
+ dict["instance_id"] = self.instance_id
+ dict["label_id"] = self.label_id
+ dict["vert_count"] = self.vert_count
+ dict["med_dist"] = self.med_dist
+ dict["dist_conf"] = self.dist_conf
+ return dict
+
+ def from_json(self, data):
+ self.instance_id = int(data["instance_id"])
+ self.label_id = int(data["label_id"])
+ self.vert_count = int(data["vert_count"])
+ if "med_dist" in data:
+ self.med_dist = float(data["med_dist"])
+ self.dist_conf = float(data["dist_conf"])
+
+ def __str__(self):
+ return "(" + str(self.instance_id) + ")"
+
+
+def read_instance_prediction_file(filename, pred_path):
+ lines = open(filename).read().splitlines()
+ instance_info = {}
+ abs_pred_path = os.path.abspath(pred_path)
+ for line in lines:
+ parts = line.split(" ")
+ if len(parts) != 3:
+ util.print_error(
+ "invalid instance prediction file. Expected (per line): [rel path prediction] [label id prediction] [confidence prediction]"
+ )
+ if os.path.isabs(parts[0]):
+ util.print_error(
+ "invalid instance prediction file. First entry in line must be a relative path"
+ )
+ mask_file = os.path.join(os.path.dirname(filename), parts[0])
+ mask_file = os.path.abspath(mask_file)
+ # check that mask_file lives inside prediction path
+ if os.path.commonprefix([mask_file, abs_pred_path]) != abs_pred_path:
+ util.print_error(
+ "predicted mask {} in prediction text file {} points outside of prediction path.".format(
+ mask_file, filename
+ )
+ )
+
+ info = {}
+ info["label_id"] = int(float(parts[1]))
+ info["conf"] = float(parts[2])
+ instance_info[mask_file] = info
+ return instance_info
+
+
+def get_instances(ids, class_ids, class_labels, id2label):
+ instances = {}
+ for label in class_labels:
+ instances[label] = []
+ instance_ids = np.unique(ids)
+ for id in instance_ids:
+ if id == 0:
+ continue
+ inst = Instance(ids, id)
+ if inst.label_id in class_ids:
+ instances[id2label[inst.label_id]].append(inst.to_dict())
+ return instances
diff --git a/models/Mask3D/mask3d/conf/__init__.py b/models/Mask3D/mask3d/conf/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/conf/augmentation/albumentations_aug.yaml b/models/Mask3D/mask3d/conf/augmentation/albumentations_aug.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..006663b4be251bf0f41ac2f66f855ae3d59a2878
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/augmentation/albumentations_aug.yaml
@@ -0,0 +1,30 @@
+__version__: 0.4.5
+transform:
+ __class_fullname__: albumentations.core.composition.Compose
+ additional_targets: {}
+ bbox_params: null
+ keypoint_params: null
+ p: 1.0
+ transforms:
+ - __class_fullname__: albumentations.augmentations.transforms.RandomBrightnessContrast
+ always_apply: true
+ brightness_by_max: true
+ brightness_limit:
+ - -0.2
+ - 0.2
+ contrast_limit:
+ - -0.2
+ - 0.2
+ p: 0.5
+ - __class_fullname__: albumentations.augmentations.transforms.RGBShift
+ always_apply: true
+ b_shift_limit:
+ - -20
+ - 20
+ g_shift_limit:
+ - -20
+ - 20
+ p: 0.5
+ r_shift_limit:
+ - -20
+ - 20
diff --git a/models/Mask3D/mask3d/conf/augmentation/volumentations_aug.yaml b/models/Mask3D/mask3d/conf/augmentation/volumentations_aug.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3b86407a2e735ad8dbba79f83746ceb79722aedf
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/augmentation/volumentations_aug.yaml
@@ -0,0 +1,53 @@
+# pi = 3.14159265358979
+# pi/2 = 1.57079632679489
+# pi/3 = 1.04719755119659
+# pi/6 = 0.52359877559829
+# pi/12 = 0.26179938779914
+# pi/24 = 0.13089969389957
+#
+__version__: 0.1.6
+transform:
+ __class_fullname__: volumentations.core.composition.Compose
+ additional_targets: {}
+ p: 1.0
+ transforms:
+ - __class_fullname__: volumentations.augmentations.transforms.Scale3d
+ always_apply: true
+ p: 0.5
+ scale_limit:
+ - - -0.1
+ - 0.1
+ - - -0.1
+ - 0.1
+ - - -0.1
+ - 0.1
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 0
+ - 0
+ - 1
+ p: 0.5
+ rotation_limit:
+ - -3.141592653589793
+ - 3.141592653589793
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 0
+ - 1
+ - 0
+ p: 0.5
+ rotation_limit:
+ - -0.13089969389957
+ - 0.13089969389957
+ - __class_fullname__: volumentations.augmentations.transforms.RotateAroundAxis3d
+ always_apply: true
+ axis:
+ - 1
+ - 0
+ - 0
+ p: 0.5
+ rotation_limit:
+ - -0.13089969389957
+ - 0.13089969389957
diff --git a/models/Mask3D/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml b/models/Mask3D/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7f0958eed35ea4317ddc3f2378dd66336472c0fa
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/callbacks/callbacks_instance_segmentation.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+- _target_: pytorch_lightning.callbacks.ModelCheckpoint
+ monitor: val_mean_ap_50
+ save_last: true
+ save_top_k: 1
+ mode: max
+ dirpath: ${general.save_dir}
+ filename: "{epoch}-{val_mean_ap_50:.3f}"
+ every_n_epochs: 1
+
+- _target_: pytorch_lightning.callbacks.LearningRateMonitor
diff --git a/models/Mask3D/mask3d/conf/config_base_instance_segmentation.yaml b/models/Mask3D/mask3d/conf/config_base_instance_segmentation.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..61aeae0519bd308a58293d07ee902beb6a64ed5d
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/config_base_instance_segmentation.yaml
@@ -0,0 +1,75 @@
+general:
+ train_mode: true
+ task: "instance_segmentation"
+ seed: null
+ checkpoint: null
+ backbone_checkpoint: null
+ freeze_backbone: false # train only last layer
+ linear_probing_backbone: false
+ train_on_segments: false
+ eval_on_segments: false
+ filter_out_instances: false
+ save_visualizations: false
+ visualization_point_size: 20
+ decoder_id: -1
+ export: false
+ use_dbscan: false
+ ignore_class_threshold: 100
+ project_name: scannet
+ workspace: jonasschult
+ experiment_name: DEBUG_ABLATION
+ num_targets: 19
+ add_instance: true
+ dbscan_eps: 0.95
+ dbscan_min_points: 1
+
+
+ export_threshold: 0.0001
+
+ reps_per_epoch: 1
+
+ on_crops: false
+
+ scores_threshold: 0.0
+ iou_threshold: 1.0
+
+ area: 5
+
+ eval_inner_core: -1 # disabled
+
+ topk_per_image: 100
+
+ ignore_mask_idx: []
+
+ max_batch_size: 99999999
+
+ save_dir: saved/${general.experiment_name}
+ # time/commit/md5(config)_uuid
+ # time/experiment_id/version_uuid
+ # experiment_id: 1 # commit[:8], or unique from logger
+ # version: 1 # md5[:8] of config
+
+ gpus: 1
+
+defaults:
+ - data: indoor
+ - data/data_loaders: simple_loader
+ - data/datasets: scannet
+ - data/collation_functions: voxelize_collate
+ - logging: full
+ - model: mask3d
+ - metrics: miou
+ - optimizer: adamw
+ - scheduler: onecyclelr
+ - trainer: trainer600
+ - callbacks: callbacks_instance_segmentation
+ - matcher: hungarian_matcher
+ - loss: set_criterion
+
+hydra:
+ run:
+ dir: saved/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S}
+ sweep:
+ dir: saved/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S}
+ # dir: ${general.save_dir}
+ subdir: ${hydra.job.num}_${hydra.job.id}
diff --git a/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate.yaml b/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..026552efb024e4e6fd90bf6bda9df283da2bf4c1
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate.yaml
@@ -0,0 +1,42 @@
+# @package data
+
+train_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.train_mode}
+ small_crops: false
+ very_small_crops: false
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.train_dataset.filter_out_classes}
+ label_offset: ${data.train_dataset.label_offset}
+ num_queries: ${model.num_queries}
+
+validation_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.validation_mode}
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.validation_dataset.filter_out_classes}
+ label_offset: ${data.validation_dataset.label_offset}
+ num_queries: ${model.num_queries}
+
+test_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.test_mode}
+ batch_instance: false
+ probing: ${general.linear_probing_backbone}
+ task: ${general.task}
+ ignore_class_threshold: ${general.ignore_class_threshold}
+ filter_out_classes: ${data.test_dataset.filter_out_classes}
+ label_offset: ${data.test_dataset.label_offset}
+ num_queries: ${model.num_queries}
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml b/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..d5d3471d143ddfe999d8f3031e41ba6efce2e879
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/collation_functions/voxelize_collate_merge.yaml
@@ -0,0 +1,36 @@
+# @package data
+
+train_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollateMerge
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.train_mode}
+ small_crops: false
+ very_small_crops: false
+ scenes: 2
+ batch_instance: false
+ make_one_pc_noise: false
+ place_nearby: false
+ place_far: false
+ proba: 1
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
+
+validation_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.validation_mode}
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
+
+test_collation:
+ _target_: mask3d.datasets.utils.VoxelizeCollate
+ ignore_label: ${data.ignore_label}
+ voxel_size: ${data.voxel_size}
+ mode: ${data.test_mode}
+ probing: ${general.linear_probing_backbone}
+ include_ignore: ${general.include_ignore}
+ task: ${general.task}
diff --git a/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader.yaml b/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..39996e14d769c2ba9341da582a1f7bf970fc7925
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader.yaml
@@ -0,0 +1,22 @@
+# @package data
+
+train_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: true
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.batch_size}
+
+validation_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.test_batch_size}
+
+test_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.test_batch_size}
diff --git a/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml b/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b1b1b45d13167dc07357a13feb5a513dd71c9a2e
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/data_loaders/simple_loader_save_memory.yaml
@@ -0,0 +1,22 @@
+# @package data
+
+train_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: true
+ pin_memory: ${data.pin_memory}
+ num_workers: ${data.num_workers}
+ batch_size: ${data.batch_size}
+
+validation_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: 1
+ batch_size: ${data.test_batch_size}
+
+test_dataloader:
+ _target_: torch.utils.data.DataLoader
+ shuffle: false
+ pin_memory: ${data.pin_memory}
+ num_workers: 1
+ batch_size: ${data.test_batch_size}
diff --git a/models/Mask3D/mask3d/conf/data/datasets/matterport.yaml b/models/Mask3D/mask3d/conf/data/datasets/matterport.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6728ab9eb26bc78f435237d9d7d61800b900735d
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/matterport.yaml
@@ -0,0 +1,48 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/matterport
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/matterport
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/matterport/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/mask3d/conf/data/datasets/matterport_scannet.yaml b/models/Mask3D/mask3d/conf/data/datasets/matterport_scannet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..df259ceaadfa68a90c2b8a60d7b74a958b30c79d
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/matterport_scannet.yaml
@@ -0,0 +1,50 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir:
+ - data/processed/scannet
+ - data/processed/matterport
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/mask3d/conf/data/datasets/rio.yaml b/models/Mask3D/mask3d/conf/data/datasets/rio.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1adfea36fea05b14a7fa95382677aee6144d1b4b
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/rio.yaml
@@ -0,0 +1,48 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: mix3d/conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+validation_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+
+test_dataset:
+ _target_: mix3d.datasets.semseg.SemanticSegmentationDataset
+ data_dir: data/processed/rio
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
diff --git a/models/Mask3D/mask3d/conf/data/datasets/s3dis.yaml b/models/Mask3D/mask3d/conf/data/datasets/s3dis.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2e1385416655514397d82737e1edc2d1a5997657
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/s3dis.yaml
@@ -0,0 +1,87 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: False
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "s3dis"
+ data_dir: data/processed/s3dis
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/s3dis/label_database.yaml
+ color_mean_std: data/processed/s3dis/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ filter_out_classes: []
+ label_offset: 0
diff --git a/models/Mask3D/mask3d/conf/data/datasets/scannet.yaml b/models/Mask3D/mask3d/conf/data/datasets/scannet.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..50f1c6c5998d8f3c6dae35ef508225dff4b0271f
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/scannet.yaml
@@ -0,0 +1,79 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: false
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ add_unlabeled_pc: false
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet"
+ data_dir: data/processed/scannet
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/scannet/label_database.yaml
+ color_mean_std: data/processed/scannet/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 1]
+ label_offset: 2
diff --git a/models/Mask3D/mask3d/conf/data/datasets/scannet200.yaml b/models/Mask3D/mask3d/conf/data/datasets/scannet200.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..730a6ab9f1965004ec9828d1e8b2429005bef6f2
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/scannet200.yaml
@@ -0,0 +1,79 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: false
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ add_unlabeled_pc: false
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "scannet200"
+ data_dir: /home/weders/scratch/scratch/scannetter/arkit/raw/
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ # label_db_filepath: data/processed/scannet200/label_database.yaml
+ # color_mean_std: data/processed/scannet200/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ filter_out_classes: [0, 2]
+ label_offset: 2
diff --git a/models/Mask3D/mask3d/conf/data/datasets/semantic_kitti.yaml b/models/Mask3D/mask3d/conf/data/datasets/semantic_kitti.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9540ad610bd4a68d64369519d20e13009df9feda
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/semantic_kitti.yaml
@@ -0,0 +1,42 @@
+# @package data
+train_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.train_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: mix3d/conf/augmentation/volumentations_aug.yaml
+
+validation_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.validation_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: null
+
+test_dataset:
+ _target_: mix3d.datasets.outdoor_semseg.LidarDataset
+ data_dir: data/processed/semantic_kitti
+ label_db_filepath: data/processed/semantic_kitti/label_database.yaml
+ mode: ${data.test_mode}
+ add_reflection: ${data.add_reflection}
+ add_distance: ${data.add_distance}
+ add_instance: ${data.add_instance}
+ num_labels: ${data.num_labels}
+ sweep: ${data.sweep}
+ data_percent: 1.0
+ ignore_label: ${data.ignore_label}
+ volume_augmentations_path: null
diff --git a/models/Mask3D/mask3d/conf/data/datasets/stpls3d.yaml b/models/Mask3D/mask3d/conf/data/datasets/stpls3d.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..913667d4123a7edead9d948358ae25cf9f7b4bb1
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/datasets/stpls3d.yaml
@@ -0,0 +1,95 @@
+# @package data
+train_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: conf/augmentation/albumentations_aug.yaml
+ volume_augmentations_path: conf/augmentation/volumentations_aug.yaml
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.train_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ # different augs experiments
+ instance_oversampling: 0.0
+ place_around_existing: False
+ point_per_cut: 0
+ max_cut_region: 0
+ flip_in_center: false
+ noise_rate: 0
+ resample_points: 0
+ cropping: ${data.cropping}
+ cropping_args: ${data.cropping_args}
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ reps_per_epoch: ${general.reps_per_epoch}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
+ is_elastic_distortion: true
+ color_drop: 0.0
+
+validation_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.validation_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ on_crops: ${general.on_crops}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
+
+test_dataset:
+ _target_: mask3d.datasets.semseg.SemanticSegmentationDataset
+ dataset_name: "stpls3d"
+ data_dir: data/processed/stpls3d
+ image_augmentations_path: null
+ volume_augmentations_path: null
+ label_db_filepath: data/processed/stpls3d/label_database.yaml
+ color_mean_std: data/processed/stpls3d/color_mean_std.yaml
+ data_percent: 1.0
+ mode: ${data.test_mode}
+ ignore_label: ${data.ignore_label}
+ num_labels: ${data.num_labels}
+ add_raw_coordinates: ${data.add_raw_coordinates}
+ add_colors: ${data.add_colors}
+ add_normals: ${data.add_normals}
+ add_instance: ${data.add_instance}
+ cache_data: ${data.cache_data}
+ cropping: false
+ is_tta: false
+ crop_min_size: ${data.crop_min_size}
+ crop_length: ${data.crop_length}
+ cropping_v1: ${data.cropping_v1}
+ area: ${general.area}
+ on_crops: ${general.on_crops}
+ eval_inner_core: ${general.eval_inner_core}
+ filter_out_classes: [0]
+ label_offset: 1
diff --git a/models/Mask3D/mask3d/conf/data/indoor.yaml b/models/Mask3D/mask3d/conf/data/indoor.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..868c37ccfe901f14396b68a38eac47b42cb3e812
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/indoor.yaml
@@ -0,0 +1,43 @@
+# @package _group_
+
+# these parameters are inherited by datasets, data_loaders and collators
+# but they might be overwritten
+
+# splits
+train_mode: train
+validation_mode: validation
+test_mode: validation # test # validation
+
+# dataset
+ignore_label: 255
+add_raw_coordinates: true # 3dim
+add_colors: true # 3dim
+add_normals: false # 3dim
+in_channels: 3 # in_channels = 3 * (add_normals + add_colors + add_raw_coordinates)
+num_labels: 20
+# num_labels: 41
+add_instance: ${general.add_instance}
+task: ${general.task}
+
+# data loader
+pin_memory: false
+num_workers: 4
+batch_size: 5
+test_batch_size: 1
+cache_data: false
+
+# collation
+voxel_size: 0.02
+
+reps_per_epoch: ${general.reps_per_epoch}
+
+cropping: false
+cropping_args:
+ min_points: 30000
+ aspect: 0.8
+ min_crop: 0.5
+ max_crop: 1.0
+
+crop_min_size: 20000
+crop_length: 6.0
+cropping_v1: true
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/conf/data/outdoor.yaml b/models/Mask3D/mask3d/conf/data/outdoor.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a77474f62d1cfb53f130160f641c65cb81a62956
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/data/outdoor.yaml
@@ -0,0 +1,26 @@
+# @package _group_
+
+# these parameters are inherited by datasets, data_loaders and collators
+# but they might be overwritten
+
+# splits
+train_mode: train
+validation_mode: validation
+test_mode: validation
+
+# dataset
+ignore_label: 255
+add_distance: true # 1dim
+add_reflection: true # 1dim
+in_channels: 2 # in_channels = add_distance + add_reflection
+num_labels: 19
+add_instance: false
+
+# data loader
+pin_memory: true
+num_workers: 4
+batch_size: 18
+sweep: 1
+
+# collation
+voxel_size: 0.15
diff --git a/models/Mask3D/mask3d/conf/logging/base.yaml b/models/Mask3D/mask3d/conf/logging/base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3d700a101ddf3d1e2c1a3cdea08190afff762a5b
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/logging/base.yaml
@@ -0,0 +1,10 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.NeptuneLogger
+ project_name: ${general.workspace}/${general.project_name}
+ experiment_name: ${general.experiment_name}
+ offline_mode: false
+
+- _target_: pytorch_lightning.loggers.CSVLogger
+ save_dir: ${general.save_dir}
+ name: ${general.experiment_id}
+ version: ${general.version}
diff --git a/models/Mask3D/mask3d/conf/logging/full.yaml b/models/Mask3D/mask3d/conf/logging/full.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b434e94dc1f0889cf0829b5f89b8509717a3546c
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/logging/full.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.WandbLogger
+ project: ${general.project_name}
+ name: ${general.experiment_name}
+ save_dir: ${general.save_dir}
+ entity: "schult"
+ resume: "allow"
+ id: ${general.experiment_name}
diff --git a/models/Mask3D/mask3d/conf/logging/minimal.yaml b/models/Mask3D/mask3d/conf/logging/minimal.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b1c46e26fefedcec50d4fdc9fc77c187d60cf7b9
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/logging/minimal.yaml
@@ -0,0 +1,5 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.CSVLogger
+ save_dir: ${general.save_dir}
+ name: ${general.experiment_id}
+ version: ${general.version}
diff --git a/models/Mask3D/mask3d/conf/logging/offline.yaml b/models/Mask3D/mask3d/conf/logging/offline.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..914ad19142ca22c3778be709208323908460ebac
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/logging/offline.yaml
@@ -0,0 +1,10 @@
+# @package _group_
+- _target_: pytorch_lightning.loggers.TensorBoardLogger
+ name: ${general.experiment_id}
+ version: ${general.version}
+ save_dir: ${general.save_dir}
+
+- _target_: pytorch_lightning.loggers.CSVLogger
+ name: ${general.experiment_id}
+ version: ${general.version}
+ save_dir: ${general.save_dir}
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/conf/loss/cross_entropy.yaml b/models/Mask3D/mask3d/conf/loss/cross_entropy.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c000f40ad2ab40605c244e38243a6e0cc7933768
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/loss/cross_entropy.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.nn.CrossEntropyLoss
+ignore_index: ${data.ignore_label}
diff --git a/models/Mask3D/mask3d/conf/loss/set_criterion.yaml b/models/Mask3D/mask3d/conf/loss/set_criterion.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3c04ba49ce1823c2d6e923a03ae0514490d463e9
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/loss/set_criterion.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+_target_: mask3d.models.criterion.SetCriterion
+num_classes: ${general.num_targets}
+eos_coef: 0.1
+losses:
+ - "labels"
+ - "masks"
+num_points: ${matcher.num_points}
+oversample_ratio: 3.0
+importance_sample_ratio: 0.75
+class_weights: -1
diff --git a/models/Mask3D/mask3d/conf/loss/set_criterion_custom_weights_1.yaml b/models/Mask3D/mask3d/conf/loss/set_criterion_custom_weights_1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1d2c308e081c1ffa61beb13308b27e6ff753f0f4
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/loss/set_criterion_custom_weights_1.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+_target_: mask3d.models.criterion.SetCriterion
+num_classes: ${general.num_targets}
+eos_coef: 0.1
+losses:
+ - "labels"
+ - "masks"
+num_points: ${matcher.num_points}
+oversample_ratio: 3.0
+importance_sample_ratio: 0.75
+class_weights: [1.0,1.5,10.0,1.0,1.0,1.0,1.0,1.0,10.0,10.0,1.0,10.0,1.0,1.0]
diff --git a/models/Mask3D/mask3d/conf/matcher/hungarian_matcher.yaml b/models/Mask3D/mask3d/conf/matcher/hungarian_matcher.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..47750b20906b6b40a131b702ba360e36ee4c8380
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/matcher/hungarian_matcher.yaml
@@ -0,0 +1,6 @@
+# @package _group_
+_target_: mask3d.models.matcher.HungarianMatcher
+cost_class: 2.
+cost_mask: 5.
+cost_dice: 2.
+num_points: -1
diff --git a/models/Mask3D/mask3d/conf/metrics/miou.yaml b/models/Mask3D/mask3d/conf/metrics/miou.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..68d1b61181d9615d7d6d7638261d119a4fc47074
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/metrics/miou.yaml
@@ -0,0 +1,4 @@
+# @package _group_
+_target_: mask3d.models.metrics.ConfusionMatrix
+num_classes: ${data.num_labels}
+ignore_label: ${data.ignore_label}
diff --git a/models/Mask3D/mask3d/conf/model/mask3d.yaml b/models/Mask3D/mask3d/conf/model/mask3d.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..95718d8710477650561e0ddd845688f50c868032
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/model/mask3d.yaml
@@ -0,0 +1,47 @@
+# @package _group_
+_target_: mask3d.models.Mask3D
+
+# transformer parameters
+hidden_dim: 128
+dim_feedforward: 1024
+num_queries: 100
+num_heads: 8
+num_decoders: 3
+dropout: 0.0
+pre_norm: false
+use_level_embed: false
+normalize_pos_enc: true
+positional_encoding_type: "fourier"
+gauss_scale: 1.0
+hlevels: [0,1,2,3]
+
+# queries
+non_parametric_queries: true
+random_query_both: false
+random_normal: false
+random_queries: false
+use_np_features: false
+
+# sampling
+sample_sizes: [200, 800, 3200, 12800, 51200]
+max_sample_size: false # change false means sampling activated
+
+shared_decoder: true
+num_classes: ${general.num_targets}
+train_on_segments: ${general.train_on_segments}
+scatter_type: "mean"
+
+voxel_size: ${data.voxel_size}
+
+config:
+ backbone:
+ _target_: mask3d.models.Res16UNet34C
+ config:
+ dialations: [ 1, 1, 1, 1 ]
+ conv1_kernel_size: 5
+ bn_momentum: 0.02
+ # depends on normals, color, raw_coordinates
+ # varies from 3 to 9
+ in_channels: ${data.in_channels}
+ out_channels: ${data.num_labels}
+ out_fpn: true
diff --git a/models/Mask3D/mask3d/conf/optimizer/adamw.yaml b/models/Mask3D/mask3d/conf/optimizer/adamw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4b4020d1ddd1444c94ea5bfbe1281c485fca587e
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/optimizer/adamw.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.optim.AdamW
+lr: 0.0001
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/conf/optimizer/adamw_lower.yaml b/models/Mask3D/mask3d/conf/optimizer/adamw_lower.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..7e42f091a0d5dd03b66ab1dcec8b81d78a692af9
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/optimizer/adamw_lower.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+_target_: torch.optim.AdamW
+lr: 0.005
diff --git a/models/Mask3D/mask3d/conf/scheduler/exponentiallr.yaml b/models/Mask3D/mask3d/conf/scheduler/exponentiallr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dc5224083670b286d75fda46304560dbcca3aecb
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/scheduler/exponentiallr.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.ExponentialLR
+ gamma: 0.99999
+ last_epoch: -1 # ${trainer.max_epochs}
+ # need to set to number because of tensorboard logger
+ # steps_per_epoch: -1
+
+pytorch_lightning_params:
+ interval: step
diff --git a/models/Mask3D/mask3d/conf/scheduler/lambdalr.yaml b/models/Mask3D/mask3d/conf/scheduler/lambdalr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..b63f6f4333e98931ce22f1a38829de0ef51a3719
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/scheduler/lambdalr.yaml
@@ -0,0 +1,8 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.StepLR
+ step_size: 99999
+
+pytorch_lightning_params:
+ interval: epoch
diff --git a/models/Mask3D/mask3d/conf/scheduler/onecyclelr.yaml b/models/Mask3D/mask3d/conf/scheduler/onecyclelr.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..c788877193d7366c21088cf9fefb77e4f62ef4d9
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/scheduler/onecyclelr.yaml
@@ -0,0 +1,11 @@
+# @package _group_
+
+scheduler:
+ _target_: torch.optim.lr_scheduler.OneCycleLR
+ max_lr: ${optimizer.lr}
+ epochs: ${trainer.max_epochs}
+ # need to set to number because of tensorboard logger
+ steps_per_epoch: -1
+
+pytorch_lightning_params:
+ interval: step
diff --git a/models/Mask3D/mask3d/conf/trainer/trainer.yaml b/models/Mask3D/mask3d/conf/trainer/trainer.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..f436300f9ca6bbbe96ca6c1b4c7e8eeffe35fabd
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/trainer/trainer.yaml
@@ -0,0 +1,7 @@
+# @package _group_
+deterministic: false
+max_epochs: 1000
+min_epochs: 1
+resume_from_checkpoint: null
+check_val_every_n_epoch: 50
+num_sanity_val_steps: -1
diff --git a/models/Mask3D/mask3d/conf/trainer/trainer600.yaml b/models/Mask3D/mask3d/conf/trainer/trainer600.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..dc9f00295aafe3431d1c0e7ca50dbc29559ea134
--- /dev/null
+++ b/models/Mask3D/mask3d/conf/trainer/trainer600.yaml
@@ -0,0 +1,7 @@
+# @package _group_
+deterministic: false
+max_epochs: 601
+min_epochs: 1
+resume_from_checkpoint: null
+check_val_every_n_epoch: 50
+num_sanity_val_steps: 2
diff --git a/models/Mask3D/mask3d/datasets/__init__.py b/models/Mask3D/mask3d/datasets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/datasets/outdoor_semseg.py b/models/Mask3D/mask3d/datasets/outdoor_semseg.py
new file mode 100644
index 0000000000000000000000000000000000000000..4592a6eda45c1a7626530eb19c42c267496749df
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/outdoor_semseg.py
@@ -0,0 +1,206 @@
+import logging
+from pathlib import Path
+from typing import List, Optional, Union, Tuple
+from random import random
+
+import numpy as np
+import volumentations as V
+import yaml
+from torch.utils.data import Dataset
+
+logger = logging.getLogger(__name__)
+
+
+class LidarDataset(Dataset):
+ def __init__(
+ self,
+ data_dir: Optional[
+ Union[str, Tuple[str]]
+ ] = "data/processed/semantic_kitti",
+ label_db_filepath: Optional[
+ str
+ ] = "./data/processed/semantic_kitti/label_database.yaml",
+ mode: Optional[str] = "train",
+ add_reflection: Optional[bool] = True,
+ add_distance: Optional[bool] = False,
+ add_instance: Optional[bool] = True,
+ num_labels: Optional[int] = -1,
+ data_percent: Optional[float] = 1.0,
+ ignore_label: Optional[Union[int, List[int]]] = 255,
+ volume_augmentations_path: Optional[str] = None,
+ sweep: Optional[int] = 1,
+ ):
+ self.mode = mode
+ self.data_dir = data_dir
+ if type(data_dir) == str:
+ self.data_dir = [self.data_dir]
+ self.ignore_label = ignore_label
+ self.add_instance = add_instance
+ self.add_distance = add_distance
+ self.add_reflection = add_reflection
+
+ # loading database files
+ self._data = []
+ for database_path in self.data_dir:
+ database_path = Path(database_path)
+ if not (database_path / f"{mode}_database.yaml").exists():
+ print(f"generate {database_path}/{mode}_database.yaml first")
+ exit()
+ self._data.extend(
+ self._load_yaml(database_path / f"{mode}_database.yaml")
+ )
+
+ labels = self._load_yaml(Path(label_db_filepath))
+ self._labels = self._select_correct_labels(labels, num_labels)
+
+ # augmentations
+ self.volume_augmentations = V.NoOp()
+ if volume_augmentations_path is not None:
+ self.volume_augmentations = V.load(
+ volume_augmentations_path, data_format="yaml"
+ )
+
+ # reformulating in sweeps
+ data = [[]]
+ last_scene = self._data[0]["scene"]
+ for x in self._data:
+ if x["scene"] == last_scene:
+ data[-1].append(x)
+ else:
+ last_scene = x["scene"]
+ data.append([x])
+ for i in range(len(data)):
+ data[i] = list(self.chunks(data[i], sweep))
+ self._data = [val for sublist in data for val in sublist]
+
+ if data_percent < 1.0:
+ self._data = self._data[: int(len(self._data) * data_percent)]
+
+ @staticmethod
+ def chunks(lst, n):
+ """Yield successive n-sized chunks from lst."""
+ for i in range(0, len(lst), n):
+ yield lst[i : i + n]
+
+ def __len__(self):
+ return len(self.data)
+
+ def __getitem__(self, idx: int):
+ points = []
+ for sweep in self.data[idx]:
+ points.append(np.load(sweep["filepath"]))
+ # rotate
+ points[-1][:, :3] = (
+ points[-1][:, :3] @ np.array(sweep["pose"])[:3, :3]
+ )
+ # translate
+ points[-1][:, :3] += np.array(sweep["pose"])[:3, 3]
+ points = np.vstack(points)
+
+ coordinates, features, labels = (
+ points[:, :3],
+ points[:, 3:-2],
+ points[:, -2:],
+ )
+
+ if not self.add_reflection:
+ features = np.ones(np.ones((len(coordinates), 1)))
+
+ if self.add_distance:
+ center_coordinate = coordinates.mean(0)
+ features = np.hstack(
+ (
+ features,
+ np.linalg.norm(coordinates - center_coordinate, axis=1)[
+ :, np.newaxis
+ ],
+ )
+ )
+
+ # volume and image augmentations for train
+ if "train" in self.mode:
+ coordinates -= coordinates.mean(0)
+ if 0.5 > random():
+ coordinates += (
+ np.random.uniform(coordinates.min(0), coordinates.max(0))
+ / 2
+ )
+ aug = self.volume_augmentations(
+ points=coordinates,
+ features=features,
+ labels=labels,
+ )
+ coordinates, features, labels = (
+ aug["points"],
+ aug["features"],
+ aug["labels"],
+ )
+
+ # prepare labels and map from 0 to 20(40)
+ labels = labels.astype(np.int32)
+ if labels.size > 0:
+ labels[:, 0] = self._remap_from_zero(labels[:, 0])
+ if not self.add_instance:
+ # taking only first column, which is segmentation label, not instance
+ labels = labels[:, 0].flatten()
+
+ return coordinates, features, labels
+
+ @property
+ def data(self):
+ """database file containing information about preproscessed dataset"""
+ return self._data
+
+ @property
+ def label_info(self):
+ """database file containing information labels used by dataset"""
+ return self._labels
+
+ @staticmethod
+ def _load_yaml(filepath):
+ with open(filepath) as f:
+ file = yaml.safe_load(f)
+ return file
+
+ def _select_correct_labels(self, labels, num_labels):
+ number_of_validation_labels = 0
+ number_of_all_labels = 0
+ for (
+ k,
+ v,
+ ) in labels.items():
+ number_of_all_labels += 1
+ if v["validation"]:
+ number_of_validation_labels += 1
+
+ if num_labels == number_of_all_labels:
+ return labels
+ elif num_labels == number_of_validation_labels:
+ valid_labels = dict()
+ for (
+ k,
+ v,
+ ) in labels.items():
+ if v["validation"]:
+ valid_labels.update({k: v})
+ return valid_labels
+ else:
+ msg = f"""not available number labels, select from:
+ {number_of_validation_labels}, {number_of_all_labels}"""
+ raise ValueError(msg)
+
+ def _remap_from_zero(self, labels):
+ labels[
+ ~np.isin(labels, list(self.label_info.keys()))
+ ] = self.ignore_label
+ # remap to the range from 0
+ for i, k in enumerate(self.label_info.keys()):
+ labels[labels == k] = i
+ return labels
+
+ def _remap_model_output(self, output):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(self.label_info.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/__init__.py b/models/Mask3D/mask3d/datasets/preprocessing/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f222dc27e73eedab1e1d82b14c1573ce632af7c
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/arkitscenes_preprocessing.py
@@ -0,0 +1,116 @@
+import re
+from pathlib import Path
+import numpy as np
+import pandas as pd
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+import os
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+from utils.point_cloud_utils import load_ply_with_normals
+
+from datasets.scannet200.scannet200_constants import (
+ VALID_CLASS_IDS_200,
+ SCANNET_COLOR_MAP_200,
+ CLASS_LABELS_200,
+)
+
+
+class ARKitScenesPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "/home/weders/scratch/scratch/scannetter/arkit/raw",
+ save_dir: str = "/home/weders/scratch/scratch/scannetter/arkit/raw",
+ modes: tuple = ('Validation', ),
+ n_jobs: int = 1,
+ git_repo: str = "./data/raw/scannet/ScanNet",
+ mesh_file: str="mesh_tsdf.ply",
+ scannet200: bool = False,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.scannet200 = scannet200
+ git_repo = Path(git_repo)
+ for mode in self.modes:
+ scenes = os.listdir(os.path.join(data_dir, mode))
+ scans_folder = "scans_test" if mode == "test" else "scans"
+ filepaths = []
+ for scene in scenes:
+ if os.path.exists(os.path.join(data_dir, mode, scene, mesh_file)):
+ filepaths.append(
+ self.data_dir
+ / mode
+ / scene
+ / mesh_file)
+ self.files[mode] = natsorted(filepaths)
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ scene = int(filepath.parent.name)
+ print(scene)
+ filebase = {
+ "filepath": filepath,
+ "scene": scene,
+ "sub_scene": scene,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+ # reading both files and checking that they are fitting
+ coords, features, _ = load_ply_with_normals(filepath)
+ file_len = len(coords)
+ filebase["file_len"] = file_len
+ points = np.hstack((coords, features))
+
+ print(features.shape)
+
+ points = np.concatenate((points, np.zeros((file_len, 4))), axis=1) # adding segment and label fake columns
+
+ processed_filepath = (
+ self.save_dir / mode / f"data_mask3d.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ return filebase
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ if not self.scannet200:
+ logger.add(self.save_dir / "fixed_bugs_in_labels.log")
+ found_wrong_labels = {
+ tuple([270, 0]): 50,
+ tuple([270, 2]): 50,
+ tuple([384, 0]): 149,
+ }
+ for scene, wrong_label in found_wrong_labels.items():
+ scene, sub_scene = scene
+ bug_file = (
+ self.save_dir / "train" / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ points = np.load(bug_file)
+ bug_mask = points[:, -1] != wrong_label
+ points = points[bug_mask]
+ np.save(bug_file, points)
+ logger.info(f"Fixed {bug_file}")
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ print(scene_match)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(ARKitScenesPreprocessing)
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/base_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/base_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..a17fd4f89aca0d16d27b1bd10c9f40b3e40a6e61
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/base_preprocessing.py
@@ -0,0 +1,204 @@
+import os
+import sys
+import re
+import yaml
+import json
+import multiprocessing
+from pathlib import Path
+from hashlib import md5
+
+import numpy as np
+from fire import Fire
+from tqdm import tqdm
+from joblib import Parallel, delayed
+from loguru import logger
+
+
+class BasePreprocessing:
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/",
+ save_dir: str = "./data/processed/",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ ):
+ self.data_dir = Path(data_dir)
+ self.save_dir = Path(save_dir)
+ self.n_jobs = n_jobs
+ self.modes = modes
+
+ if not self.data_dir.exists():
+ logger.error("data folder doesn't exist")
+ raise FileNotFoundError
+ if self.save_dir.exists() is False:
+ self.save_dir.mkdir(parents=True, exist_ok=True)
+
+ self.files = {}
+ for data_type in self.modes:
+ self.files.update({data_type: []})
+
+ @logger.catch
+ def preprocess(self):
+ self.n_jobs = (
+ multiprocessing.cpu_count() if self.n_jobs == -1 else self.n_jobs
+ )
+ for mode in self.modes:
+ database = []
+ logger.info(f"Tasks for {mode}: {len(self.files[mode])}")
+ parallel_results = Parallel(n_jobs=self.n_jobs, verbose=10)(
+ delayed(self.process_file)(file, mode)
+ for file in self.files[mode]
+ )
+ for filebase in parallel_results:
+ database.append(filebase)
+ self.save_database(database, mode)
+ # self.fix_bugs_in_labels()
+ # self.joint_database()
+ # self.compute_color_mean_std(
+ # train_database_path=(self.save_dir / "train_database.yaml")
+ # )
+
+ def preprocess_sequential(self):
+ for mode in self.modes:
+ database = []
+ for filepath in tqdm(self.files[mode], unit="file"):
+ filebase = self.process_file(filepath, mode)
+ database.append(filebase)
+ self.save_database(database, mode)
+ self.fix_bugs_in_labels()
+ self.joint_database()
+ self.compute_color_mean_std(
+ train_database_path=(self.save_dir / "train_database.yaml")
+ )
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Args:
+ filepath: path to the main file
+ mode: typically train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ raise NotImplementedError
+
+ def make_instance_database_sequential(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ mode="instance",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ instance_database = []
+ for sample in tqdm(train_database):
+ instance_database.append(self.extract_instance_from_file(sample))
+ self.save_database(instance_database, mode=mode)
+
+ @logger.catch
+ def make_instance_database(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ mode="instance",
+ ):
+ self.n_jobs = (
+ multiprocessing.cpu_count() if self.n_jobs == -1 else self.n_jobs
+ )
+ train_database = self._load_yaml(train_database_path)
+ instance_database = []
+ logger.info(f"Files in database: {len(train_database)}")
+ parallel_results = Parallel(n_jobs=self.n_jobs, verbose=10)(
+ delayed(self.extract_instance_from_file)(sample)
+ for sample in train_database
+ )
+ for filebase in parallel_results:
+ instance_database.append(filebase)
+ self.save_database(instance_database, mode=mode)
+
+ def extract_instance_from_file(self, sample_from_database):
+ points = np.load(sample_from_database["filepath"])
+ labels = points[:, -2:]
+ file_instances = []
+ for instance_id in np.unique(labels[:, 1]):
+ occupied_indices = np.isin(labels[:, 1], instance_id)
+ instance_points = points[occupied_indices].copy()
+ instance_classes = (
+ np.unique(instance_points[:, 9]).astype(int).tolist()
+ )
+
+ hash_string = str(sample_from_database["filepath"]) + str(
+ instance_id
+ )
+ hash_string = md5(hash_string.encode("utf-8")).hexdigest()
+ instance_filepath = (
+ self.save_dir / "instances" / f"{hash_string}.npy"
+ )
+ instance = {
+ "classes": instance_classes,
+ "instance_filepath": str(instance_filepath),
+ "instance_size": len(instance_points),
+ "original_file": str(sample_from_database["filepath"]),
+ }
+ if not instance_filepath.parent.exists():
+ instance_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(instance_filepath, instance_points.astype(np.float32))
+ file_instances.append(instance)
+ return file_instances
+
+ def fix_bugs_in_labels(self):
+ pass
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/train_database.yaml",
+ ):
+ pass
+
+ def save_database(self, database, mode):
+ for element in database:
+ self._dict_to_yaml(element)
+ self._save_yaml(self.save_dir / (mode + "_database.yaml"), database)
+
+ def joint_database(self, train_modes=["train", "validation"]):
+ joint_db = []
+ for mode in train_modes:
+ joint_db.extend(
+ self._load_yaml(self.save_dir / (mode + "_database.yaml"))
+ )
+ self._save_yaml(
+ self.save_dir / "train_validation_database.yaml", joint_db
+ )
+
+ @classmethod
+ def _read_json(cls, path):
+ with open(path) as f:
+ file = json.load(f)
+ return file
+
+ @classmethod
+ def _save_yaml(cls, path, file):
+ with open(path, "w") as f:
+ yaml.safe_dump(
+ file, f, default_style=None, default_flow_style=False
+ )
+
+ @classmethod
+ def _dict_to_yaml(cls, dictionary):
+ if not isinstance(dictionary, dict):
+ return
+ for k, v in dictionary.items():
+ if isinstance(v, dict):
+ cls._dict_to_yaml(v)
+ if isinstance(v, np.ndarray):
+ dictionary[k] = v.tolist()
+ if isinstance(v, Path):
+ dictionary[k] = str(v)
+
+ @classmethod
+ def _load_yaml(cls, filepath):
+ with open(filepath) as f:
+ file = yaml.safe_load(f)
+ return file
+
+
+if __name__ == "__main__":
+ Fire(BasePreprocessing)
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/s3dis_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/s3dis_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..3e7ff4967ca9dc22248c6863b41f7b652687ae98
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/s3dis_preprocessing.py
@@ -0,0 +1,282 @@
+import os
+import re
+
+import numpy as np
+from fire import Fire
+from loguru import logger
+from natsort import natsorted
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+
+
+class S3DISPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/s3dis",
+ save_dir: str = "./data/processed/s3dis",
+ modes: tuple = (
+ "Area_1",
+ "Area_2",
+ "Area_3",
+ "Area_4",
+ "Area_5",
+ "Area_6",
+ ),
+ n_jobs: int = -1,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.class_map = {
+ "ceiling": 0,
+ "floor": 1,
+ "wall": 2,
+ "beam": 3,
+ "column": 4,
+ "window": 5,
+ "door": 6,
+ "table": 7,
+ "chair": 8,
+ "sofa": 9,
+ "bookcase": 10,
+ "board": 11,
+ "clutter": 12,
+ "stairs": 12, # stairs are also mapped to clutter
+ }
+
+ self.color_map = [
+ [0, 255, 0], # ceiling
+ [0, 0, 255], # floor
+ [0, 255, 255], # wall
+ [255, 255, 0], # beam
+ [255, 0, 255], # column
+ [100, 100, 255], # window
+ [200, 200, 100], # door
+ [170, 120, 200], # table
+ [255, 0, 0], # chair
+ [200, 100, 100], # sofa
+ [10, 200, 100], # bookcase
+ [200, 200, 200], # board
+ [50, 50, 50],
+ ] # clutter
+
+ self.create_label_database()
+
+ for mode in self.modes:
+ filepaths = []
+ for scene_path in [
+ f.path for f in os.scandir(self.data_dir / mode) if f.is_dir()
+ ]:
+ filepaths.append(scene_path)
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self):
+ label_database = dict()
+ for class_name, class_id in self.class_map.items():
+ label_database[class_id] = {
+ "color": self.color_map[class_id],
+ "name": class_name,
+ "validation": True,
+ }
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def _buf_count_newlines_gen(self, fname):
+ def _make_gen(reader):
+ while True:
+ b = reader(2**16)
+ if not b:
+ break
+ yield b
+
+ with open(fname, "rb") as f:
+ count = sum(buf.count(b"\n") for buf in _make_gen(f.raw.read))
+ return count
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ filebase = {
+ "filepath": filepath,
+ "scene": filepath.split("/")[-1],
+ "area": mode,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+
+ scene_name = filepath.split("/")[-1]
+ instance_counter = 0
+ scene_points = []
+ for instance in [
+ f
+ for f in os.scandir(
+ self.data_dir / mode / scene_name / "Annotations"
+ )
+ if f.name.endswith(".txt")
+ ]:
+ instance_class = self.class_map[instance.name.split("_")[0]]
+ instance_points = np.loadtxt(instance.path)
+
+ instance_normals = np.ones((instance_points.shape[0], 3))
+ instance_class = np.array(instance_class).repeat(
+ instance_points.shape[0]
+ )[..., None]
+ instance_id = np.array(instance_counter).repeat(
+ instance_points.shape[0]
+ )[..., None]
+
+ instance_points = np.hstack(
+ (
+ instance_points,
+ instance_normals,
+ instance_class,
+ instance_id,
+ )
+ )
+
+ scene_points.append(instance_points)
+ instance_counter += 1
+
+ points = np.vstack(scene_points)
+
+ pcd_size = self._buf_count_newlines_gen(f"{filepath}/{scene_name}.txt")
+ if points.shape[0] != pcd_size:
+ print(f"FILE SIZE DOES NOT MATCH FOR {filepath}/{scene_name}.txt")
+ print(f"({points.shape[0]} vs. {pcd_size})")
+
+ filebase["raw_segmentation_filepath"] = ""
+
+ # add segment id as additional feature (DUMMY)
+ points = np.hstack((points, np.ones(points.shape[0])[..., None]))
+ points[:, [9, 10, -1]] = points[
+ :, [-1, 9, 10]
+ ] # move segments after RGB
+
+ gt_data = (points[:, -2] + 1) * 1000 + points[:, -1] + 1
+
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ processed_filepath = self.save_dir / mode / f"{scene_name}.npy"
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ processed_gt_filepath = (
+ self.save_dir / "instance_gt" / mode / f"{scene_name}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.savetxt(processed_gt_filepath, gt_data.astype(np.int32), fmt="%d")
+ filebase["instance_gt_filepath"] = str(processed_gt_filepath)
+
+ filebase["color_mean"] = [
+ float((points[:, 3] / 255).mean()),
+ float((points[:, 4] / 255).mean()),
+ float((points[:, 5] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((points[:, 3] / 255) ** 2).mean()),
+ float(((points[:, 4] / 255) ** 2).mean()),
+ float(((points[:, 5] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(self, train_database_path: str = ""):
+ area_database_paths = [
+ f
+ for f in os.scandir(self.save_dir)
+ if f.name.startswith("Area_") and f.name.endswith(".yaml")
+ ]
+
+ for database_path in area_database_paths:
+ database = self._load_yaml(database_path.path)
+ color_mean, color_std = [], []
+ for sample in database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(
+ np.array(color_std).mean(axis=0) - color_mean**2
+ )
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(
+ self.save_dir / f"{database_path.name}_color_mean_std.yaml",
+ feats_mean_std,
+ )
+
+ for database_path in area_database_paths:
+ all_mean, all_std = [], []
+ for let_out_path in area_database_paths:
+ if database_path == let_out_path:
+ continue
+
+ database = self._load_yaml(let_out_path.path)
+ for sample in database:
+ all_std.append(sample["color_std"])
+ all_mean.append(sample["color_mean"])
+
+ all_color_mean = np.array(all_mean).mean(axis=0)
+ all_color_std = np.sqrt(
+ np.array(all_std).mean(axis=0) - all_color_mean**2
+ )
+ feats_mean_std = {
+ "mean": [float(each) for each in all_color_mean],
+ "std": [float(each) for each in all_color_std],
+ }
+ file_path = database_path.name.replace("_database.yaml", "")
+ self._save_yaml(
+ self.save_dir / f"{file_path}_color_mean_std.yaml",
+ feats_mean_std,
+ )
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ pass
+
+ def joint_database(
+ self,
+ train_modes=(
+ "Area_1",
+ "Area_2",
+ "Area_3",
+ "Area_4",
+ "Area_5",
+ "Area_6",
+ ),
+ ):
+ for mode in train_modes:
+ joint_db = []
+ for let_out in train_modes:
+ if mode == let_out:
+ continue
+ joint_db.extend(
+ self._load_yaml(
+ self.save_dir / (let_out + "_database.yaml")
+ )
+ )
+ self._save_yaml(
+ self.save_dir / f"train_{mode}_database.yaml", joint_db
+ )
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(S3DISPreprocessing)
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/scannet_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/scannet_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a981864612e04930b04c9c0df8aaa6e2d9249a3
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/scannet_preprocessing.py
@@ -0,0 +1,296 @@
+import re
+from pathlib import Path
+import numpy as np
+import pandas as pd
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+from utils.point_cloud_utils import load_ply_with_normals
+
+from datasets.scannet200.scannet200_constants import (
+ VALID_CLASS_IDS_200,
+ SCANNET_COLOR_MAP_200,
+ CLASS_LABELS_200,
+)
+
+
+class ScannetPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/scannet/scannet",
+ save_dir: str = "./data/processed/scannet",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ git_repo: str = "./data/raw/scannet/ScanNet",
+ scannet200: bool = False,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ self.scannet200 = scannet200
+
+ if self.scannet200:
+ self.labels_pd = pd.read_csv(
+ self.data_dir / "scannetv2-labels.combined.tsv",
+ sep="\t",
+ header=0,
+ )
+
+ git_repo = Path(git_repo)
+ self.create_label_database(git_repo)
+ for mode in self.modes:
+ trainval_split_dir = git_repo / "Tasks" / "Benchmark"
+ scannet_special_mode = "val" if mode == "validation" else mode
+ with open(
+ trainval_split_dir / (f"scannetv2_{scannet_special_mode}.txt")
+ ) as f:
+ # -1 because the last one is always empty
+ split_file = f.read().split("\n")[:-1]
+
+ scans_folder = "scans_test" if mode == "test" else "scans"
+ filepaths = []
+ for scene in split_file:
+ filepaths.append(
+ self.data_dir
+ / scans_folder
+ / scene
+ / (scene + "_vh_clean_2.ply")
+ )
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self, git_repo):
+ if self.scannet200:
+ label_database = {}
+ for row_id, class_id in enumerate(VALID_CLASS_IDS_200):
+ label_database[class_id] = {
+ "color": SCANNET_COLOR_MAP_200[class_id],
+ "name": CLASS_LABELS_200[row_id],
+ "validation": True,
+ }
+ self._save_yaml(
+ self.save_dir / "label_database.yaml", label_database
+ )
+ return label_database
+ else:
+ if (self.save_dir / "label_database.yaml").exists():
+ return self._load_yaml(self.save_dir / "label_database.yaml")
+ df = pd.read_csv(
+ self.data_dir / "scannetv2-labels.combined.tsv", sep="\t"
+ )
+ df = (
+ df[~df[["nyu40class", "nyu40id"]].duplicated()][
+ ["nyu40class", "nyu40id"]
+ ]
+ .set_index("nyu40id")
+ .sort_index()[["nyu40class"]]
+ .rename(columns={"nyu40class": "name"})
+ .replace(" ", "_", regex=True)
+ )
+ df = pd.DataFrame([{"name": "empty"}]).append(df)
+ df["validation"] = False
+
+ with open(
+ git_repo
+ / "Tasks"
+ / "Benchmark"
+ / "classes_SemVoxLabel-nyu40id.txt"
+ ) as f:
+ for_validation = f.read().split("\n")
+ for category in for_validation:
+ index = int(re.split(" +", category)[0])
+ df.loc[index, "validation"] = True
+
+ # doing this hack because otherwise I will have to install imageio
+ with open(git_repo / "BenchmarkScripts" / "util.py") as f:
+ util = f.read()
+ color_list = eval("[" + util.split("return [\n")[1])
+
+ df["color"] = color_list
+
+ label_database = df.to_dict("index")
+ self._save_yaml(
+ self.save_dir / "label_database.yaml", label_database
+ )
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ scene, sub_scene = self._parse_scene_subscene(filepath.name)
+ filebase = {
+ "filepath": filepath,
+ "scene": scene,
+ "sub_scene": sub_scene,
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+ # reading both files and checking that they are fitting
+ coords, features, _ = load_ply_with_normals(filepath)
+ file_len = len(coords)
+ filebase["file_len"] = file_len
+ points = np.hstack((coords, features))
+
+ if mode in ["train", "validation"]:
+ # getting scene information
+ description_filepath = Path(
+ filepath
+ ).parent / filepath.name.replace("_vh_clean_2.ply", ".txt")
+ with open(description_filepath) as f:
+ scene_type = f.read().split("\n")[:-1]
+ scene_type = scene_type[-1].split(" = ")[1]
+ filebase["scene_type"] = scene_type
+ filebase["raw_description_filepath"] = description_filepath
+
+ # getting instance info
+ instance_info_filepath = next(
+ Path(filepath).parent.glob("*.aggregation.json")
+ )
+ segment_indexes_filepath = next(
+ Path(filepath).parent.glob("*[0-9].segs.json")
+ )
+ instance_db = self._read_json(instance_info_filepath)
+ segments = self._read_json(segment_indexes_filepath)
+ segments = np.array(segments["segIndices"])
+ filebase["raw_instance_filepath"] = instance_info_filepath
+ filebase["raw_segmentation_filepath"] = segment_indexes_filepath
+
+ # add segment id as additional feature
+ segment_ids = np.unique(segments, return_inverse=True)[1]
+ points = np.hstack((points, segment_ids[..., None]))
+
+ # reading labels file
+ label_filepath = filepath.parent / filepath.name.replace(
+ ".ply", ".labels.ply"
+ )
+ filebase["raw_label_filepath"] = label_filepath
+ label_coords, label_colors, labels = load_ply_with_normals(
+ label_filepath
+ )
+ if not np.allclose(coords, label_coords):
+ raise ValueError("files doesn't have same coordinates")
+
+ # adding instance label
+ labels = labels[:, np.newaxis]
+ empty_instance_label = np.full(labels.shape, -1)
+ labels = np.hstack((labels, empty_instance_label))
+ for instance in instance_db["segGroups"]:
+ segments_occupied = np.array(instance["segments"])
+ occupied_indices = np.isin(segments, segments_occupied)
+ labels[occupied_indices, 1] = instance["id"]
+
+ if self.scannet200:
+ label200 = instance["label"]
+ # Map the category name to id
+ label_ids = self.labels_pd[
+ self.labels_pd["raw_category"] == label200
+ ]["id"]
+ label_id = (
+ int(label_ids.iloc[0]) if len(label_ids) > 0 else 0
+ )
+ labels[occupied_indices, 0] = label_id
+ points = np.hstack((points, labels))
+
+ # gt_data = (points[:, -2] + 1) * 1000 + points[:, -1] + 1
+ gt_data = points[:, -2] * 1000 + points[:, -1] + 1
+ else:
+ segments_test = "../../data/raw/scannet_test_segments"
+ segment_indexes_filepath = filepath.name.replace(
+ ".ply", ".0.010000.segs.json"
+ )
+ segments = self._read_json(
+ f"{segments_test}/{segment_indexes_filepath}"
+ )
+ segments = np.array(segments["segIndices"])
+ # add segment id as additional feature
+ segment_ids = np.unique(segments, return_inverse=True)[1]
+ points = np.hstack((points, segment_ids[..., None]))
+
+ processed_filepath = (
+ self.save_dir / mode / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ if mode == "test":
+ return filebase
+
+ processed_gt_filepath = (
+ self.save_dir
+ / "instance_gt"
+ / mode
+ / f"scene{scene:04}_{sub_scene:02}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.savetxt(processed_gt_filepath, gt_data.astype(np.int32), fmt="%d")
+ filebase["instance_gt_filepath"] = str(processed_gt_filepath)
+
+ filebase["color_mean"] = [
+ float((features[:, 0] / 255).mean()),
+ float((features[:, 1] / 255).mean()),
+ float((features[:, 2] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((features[:, 0] / 255) ** 2).mean()),
+ float(((features[:, 1] / 255) ** 2).mean()),
+ float(((features[:, 2] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/scannet/train_database.yaml",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ color_mean, color_std = [], []
+ for sample in train_database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(np.array(color_std).mean(axis=0) - color_mean**2)
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(self.save_dir / "color_mean_std.yaml", feats_mean_std)
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ if not self.scannet200:
+ logger.add(self.save_dir / "fixed_bugs_in_labels.log")
+ found_wrong_labels = {
+ tuple([270, 0]): 50,
+ tuple([270, 2]): 50,
+ tuple([384, 0]): 149,
+ }
+ for scene, wrong_label in found_wrong_labels.items():
+ scene, sub_scene = scene
+ bug_file = (
+ self.save_dir / "train" / f"{scene:04}_{sub_scene:02}.npy"
+ )
+ points = np.load(bug_file)
+ bug_mask = points[:, -1] != wrong_label
+ points = points[bug_mask]
+ np.save(bug_file, points)
+ logger.info(f"Fixed {bug_file}")
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(ScannetPreprocessing)
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..d483e535435cca026588c3177cfe368fad99596b
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/semantic_kitti_preprocessing.py
@@ -0,0 +1,181 @@
+import re
+from pathlib import Path
+from hashlib import md5
+from natsort import natsorted
+
+import numpy as np
+from fire import Fire
+
+from base_preprocessing import BasePreprocessing
+
+
+class SemanticKittiPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "./data/raw/semantic_kitti",
+ save_dir: str = "./data/processed/semantic_kitti",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ git_repo: str = "./data/raw/semantic-kitti-api",
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ git_repo = Path(git_repo)
+ self.create_label_database(git_repo / "config" / "semantic-kitti.yaml")
+ self.config = self._load_yaml(
+ git_repo / "config" / "semantic-kitti.yaml"
+ )
+ self.pose = dict()
+
+ for mode in self.modes:
+ scene_mode = "valid" if mode == "validation" else mode
+ self.pose[mode] = dict()
+ for scene in sorted(self.config["split"][scene_mode]):
+ filepaths = list(
+ self.data_dir.glob(f"*/{scene:02}/velodyne/*bin")
+ )
+ filepaths = [str(file) for file in filepaths]
+ self.files[mode].extend(natsorted(filepaths))
+ calibration = parse_calibration(
+ Path(filepaths[0]).parent.parent / "calib.txt"
+ )
+ self.pose[mode].update(
+ {
+ scene: parse_poses(
+ Path(filepaths[0]).parent.parent / "poses.txt",
+ calibration,
+ ),
+ }
+ )
+
+ def create_label_database(self, config_file):
+ if (self.save_dir / "label_database.yaml").exists():
+ return self._load_yaml(self.save_dir / "label_database.yaml")
+ config = self._load_yaml(config_file)
+ label_database = {}
+ for key, old_key in config["learning_map_inv"].items():
+ label_database.update(
+ {
+ key: {
+ "name": config["labels"][old_key],
+ # bgr -> rgb
+ "color": config["color_map"][old_key][::-1],
+ "validation": not config["learning_ignore"][key],
+ }
+ }
+ )
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test
+
+ Returns:
+ filebase: info about file
+ """
+ scene, sub_scene = re.search(r"(\d{2}).*(\d{6})", filepath).group(1, 2)
+ filebase = {
+ "filepath": filepath,
+ "scene": int(scene),
+ "sub_scene": int(sub_scene),
+ "file_len": -1,
+ "pose": self.pose[mode][int(scene)][int(sub_scene)].tolist(),
+ }
+
+ points = np.fromfile(filepath, dtype=np.float32).reshape(-1, 4)
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ if mode in ["train", "validation"]:
+ # getting label info
+ label_filepath = filepath.replace("velodyne", "labels").replace(
+ "bin", "label"
+ )
+ filebase["label_filepath"] = label_filepath
+ label = np.fromfile(label_filepath, dtype=np.uint32).astype(
+ np.int32
+ )
+ if not points.shape[0] == label.shape[0]:
+ raise ValueError("Files do not have same length")
+ semantic_label = label & 0xFFFF
+ instance_label = label >> 16
+
+ semantic_label_copy = semantic_label.copy()
+ for label in np.unique(semantic_label):
+ semantic_label[semantic_label_copy == label] = self.config[
+ "learning_map"
+ ][label]
+
+ label = np.hstack(
+ (semantic_label[:, np.newaxis], instance_label[:, np.newaxis])
+ )
+ points = np.hstack((points, label))
+
+ processed_filepath = self.save_dir / mode / f"{scene}_{sub_scene}.npy"
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ return filebase
+
+
+def parse_calibration(filename):
+ """read calibration file with given filename
+ Returns
+ -------
+ dict
+ Calibration matrices as 4x4 numpy arrays.
+ """
+ calib = {}
+
+ with open(filename) as calib_file:
+ for line in calib_file:
+ key, content = line.strip().split(":")
+ values = [float(v) for v in content.strip().split()]
+
+ pose = np.zeros((4, 4))
+ pose[0, 0:4] = values[0:4]
+ pose[1, 0:4] = values[4:8]
+ pose[2, 0:4] = values[8:12]
+ pose[3, 3] = 1.0
+
+ calib[key] = pose
+ return calib
+
+
+def parse_poses(filename, calibration):
+ """read poses file with per-scan poses from given filename
+ Returns
+ -------
+ list
+ list of poses as 4x4 numpy arrays.
+ """
+
+ poses = []
+
+ Tr = calibration["Tr"]
+ Tr_inv = np.linalg.inv(Tr)
+
+ with open(filename) as file:
+ for line in file:
+ values = [float(v) for v in line.strip().split()]
+
+ pose = np.zeros((4, 4))
+ pose[0, 0:4] = values[0:4]
+ pose[1, 0:4] = values[4:8]
+ pose[2, 0:4] = values[8:12]
+ pose[3, 3] = 1.0
+
+ poses.append(np.matmul(Tr_inv, np.matmul(pose, Tr)))
+
+ return poses
+
+
+if __name__ == "__main__":
+ Fire(SemanticKittiPreprocessing)
diff --git a/models/Mask3D/mask3d/datasets/preprocessing/stpls3d_preprocessing.py b/models/Mask3D/mask3d/datasets/preprocessing/stpls3d_preprocessing.py
new file mode 100644
index 0000000000000000000000000000000000000000..63ed5bff5d52e656f4bad2f853e5973b433871bd
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/preprocessing/stpls3d_preprocessing.py
@@ -0,0 +1,291 @@
+import re
+import os
+import numpy as np
+from fire import Fire
+from natsort import natsorted
+from loguru import logger
+import pandas as pd
+
+from datasets.preprocessing.base_preprocessing import BasePreprocessing
+
+
+class STPLS3DPreprocessing(BasePreprocessing):
+ def __init__(
+ self,
+ data_dir: str = "../../data/raw/stpls3d",
+ save_dir: str = "../../data/processed/stpls3d",
+ modes: tuple = ("train", "validation", "test"),
+ n_jobs: int = -1,
+ ):
+ super().__init__(data_dir, save_dir, modes, n_jobs)
+
+ # https://github.com/meidachen/STPLS3D/blob/main/HAIS/STPLS3DInstanceSegmentationChallenge_Codalab_Evaluate.py#L31
+ CLASS_LABELS = [
+ "Build",
+ "LowVeg",
+ "MediumVeg",
+ "HighVeg",
+ "Vehicle",
+ "Truck",
+ "Aircraft",
+ "MilitaryVeh",
+ "Bike",
+ "Motorcycle",
+ "LightPole",
+ "StreetSign",
+ "Clutter",
+ "Fence",
+ ]
+ VALID_CLASS_IDS = np.array(
+ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+ )
+
+ self.class_map = {
+ "Ground": 0,
+ "Build": 1,
+ "LowVeg": 2,
+ "MediumVeg": 3,
+ "HighVeg": 4,
+ "Vehicle": 5,
+ "Truck": 6,
+ "Aircraft": 7,
+ "MilitaryVeh": 8,
+ "Bike": 9,
+ "Motorcycle": 10,
+ "LightPole": 11,
+ "StreetSign": 12,
+ "Clutter": 13,
+ "Fence": 14,
+ }
+
+ self.color_map = [
+ [0, 255, 0], # Ground
+ [0, 0, 255], # Build
+ [0, 255, 255], # LowVeg
+ [255, 255, 0], # MediumVeg
+ [255, 0, 255], # HiVeg
+ [100, 100, 255], # Vehicle
+ [200, 200, 100], # Truck
+ [170, 120, 200], # Aircraft
+ [255, 0, 0], # MilitaryVec
+ [200, 100, 100], # Bike
+ [10, 200, 100], # Motorcycle
+ [200, 200, 200], # LightPole
+ [50, 50, 50], # StreetSign
+ [60, 130, 60], # Clutter
+ [130, 30, 60],
+ ] # Fence
+
+ self.create_label_database()
+
+ for mode in self.modes:
+ filepaths = []
+ for scene_path in [
+ f.path for f in os.scandir(self.data_dir / mode)
+ ]:
+ filepaths.append(scene_path)
+ self.files[mode] = natsorted(filepaths)
+
+ def create_label_database(self):
+ label_database = dict()
+ for class_name, class_id in self.class_map.items():
+ label_database[class_id] = {
+ "color": self.color_map[class_id],
+ "name": class_name,
+ "validation": True,
+ }
+
+ self._save_yaml(self.save_dir / "label_database.yaml", label_database)
+ return label_database
+
+ def process_file(self, filepath, mode):
+ """process_file.
+
+ Please note, that for obtaining segmentation labels ply files were used.
+
+ Args:
+ filepath: path to the main ply file
+ mode: train, test or validation
+
+ Returns:
+ filebase: info about file
+ """
+ filebase = {
+ "filepath": filepath,
+ "scene": filepath.split("/")[-1],
+ "raw_filepath": str(filepath),
+ "file_len": -1,
+ }
+
+ points = pd.read_csv(filepath, header=None).values
+
+ filebase["raw_segmentation_filepath"] = ""
+
+ # add segment id as additional feature (DUMMY)
+ if mode in ["train", "validation"]:
+ points = np.hstack(
+ (
+ points,
+ np.ones(points.shape[0])[..., None], # normal 1
+ np.ones(points.shape[0])[..., None], # normal 2
+ np.ones(points.shape[0])[..., None], # normal 3
+ np.ones(points.shape[0])[..., None],
+ )
+ ) # segments
+ else:
+ # we need to add dummies for semantics and instances
+ points = np.hstack(
+ (
+ points,
+ np.ones(points.shape[0])[..., None], # semantic class
+ np.ones(points.shape[0])[..., None], # instance id
+ np.ones(points.shape[0])[..., None], # normal 1
+ np.ones(points.shape[0])[..., None], # normal 2
+ np.ones(points.shape[0])[..., None], # normal 3
+ np.ones(points.shape[0])[..., None],
+ )
+ ) # segments
+
+ points = points[
+ :, [0, 1, 2, 3, 4, 5, 8, 9, 10, 11, 6, 7]
+ ] # move segments after RGB
+
+ # move point clouds to be in positive range (important for split pointcloud function)
+ points[:, :3] = points[:, :3] - points[:, :3].min(0)
+
+ points = points.astype(np.float32)
+
+ if mode == "test":
+ points = points[:, :-2]
+ else:
+ points[
+ points[:, -1] == -100.0, -1
+ ] = -1 # -1 indicates "no instance"
+
+ file_len = len(points)
+ filebase["file_len"] = file_len
+
+ processed_filepath = (
+ self.save_dir
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(parents=True, exist_ok=True)
+ np.save(processed_filepath, points.astype(np.float32))
+ filebase["filepath"] = str(processed_filepath)
+
+ if mode in ["validation", "test"]:
+ blocks = self.splitPointCloud(points)
+
+ filebase["instance_gt_filepath"] = []
+ filebase["filepath_crop"] = []
+ for block_id, block in enumerate(blocks):
+ if len(block) > 10000:
+ if mode == "validation":
+ new_instance_ids = np.unique(
+ block[:, -1], return_inverse=True
+ )[1]
+
+ assert new_instance_ids.shape[0] == block.shape[0]
+ # == 0 means -1 == no instance
+ # new_instance_ids[new_instance_ids == 0]
+ assert (
+ new_instance_ids.max() < 1000
+ ), "we cannot encode when there are more than 999 instances in a block"
+
+ gt_data = (block[:, -2]) * 1000 + new_instance_ids
+
+ processed_gt_filepath = (
+ self.save_dir
+ / "instance_gt"
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}_{block_id}.txt"
+ )
+ if not processed_gt_filepath.parent.exists():
+ processed_gt_filepath.parent.mkdir(
+ parents=True, exist_ok=True
+ )
+ np.savetxt(
+ processed_gt_filepath,
+ gt_data.astype(np.int32),
+ fmt="%d",
+ )
+ filebase["instance_gt_filepath"].append(
+ str(processed_gt_filepath)
+ )
+
+ processed_filepath = (
+ self.save_dir
+ / mode
+ / f"{filebase['scene'].replace('.txt', '')}_{block_id}.npy"
+ )
+ if not processed_filepath.parent.exists():
+ processed_filepath.parent.mkdir(
+ parents=True, exist_ok=True
+ )
+ np.save(processed_filepath, block.astype(np.float32))
+ filebase["filepath_crop"].append(str(processed_filepath))
+ else:
+ print("block was smaller than 1000 points")
+ assert False
+
+ filebase["color_mean"] = [
+ float((points[:, 3] / 255).mean()),
+ float((points[:, 4] / 255).mean()),
+ float((points[:, 5] / 255).mean()),
+ ]
+ filebase["color_std"] = [
+ float(((points[:, 3] / 255) ** 2).mean()),
+ float(((points[:, 4] / 255) ** 2).mean()),
+ float(((points[:, 5] / 255) ** 2).mean()),
+ ]
+ return filebase
+
+ def compute_color_mean_std(
+ self,
+ train_database_path: str = "./data/processed/stpls3d/train_database.yaml",
+ ):
+ train_database = self._load_yaml(train_database_path)
+ color_mean, color_std = [], []
+ for sample in train_database:
+ color_std.append(sample["color_std"])
+ color_mean.append(sample["color_mean"])
+
+ color_mean = np.array(color_mean).mean(axis=0)
+ color_std = np.sqrt(np.array(color_std).mean(axis=0) - color_mean**2)
+ feats_mean_std = {
+ "mean": [float(each) for each in color_mean],
+ "std": [float(each) for each in color_std],
+ }
+ self._save_yaml(self.save_dir / "color_mean_std.yaml", feats_mean_std)
+
+ def splitPointCloud(self, cloud, size=50.0, stride=50):
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - size) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - size) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks = []
+ for (x, y) in cells:
+ xcond = (cloud[:, 0] <= x + size) & (cloud[:, 0] >= x)
+ ycond = (cloud[:, 1] <= y + size) & (cloud[:, 1] >= y)
+ cond = xcond & ycond
+ block = cloud[cond, :]
+ blocks.append(block)
+ return blocks
+
+ @logger.catch
+ def fix_bugs_in_labels(self):
+ pass
+
+ def _parse_scene_subscene(self, name):
+ scene_match = re.match(r"scene(\d{4})_(\d{2})", name)
+ return int(scene_match.group(1)), int(scene_match.group(2))
+
+
+if __name__ == "__main__":
+ Fire(STPLS3DPreprocessing)
diff --git a/models/Mask3D/mask3d/datasets/random_cuboid.py b/models/Mask3D/mask3d/datasets/random_cuboid.py
new file mode 100644
index 0000000000000000000000000000000000000000..334b87ecadbd9cbee2979d462532fb4a479b280f
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/random_cuboid.py
@@ -0,0 +1,96 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import numpy as np
+import torch
+
+
+def check_aspect(crop_range, aspect_min):
+ xy_aspect = np.min(crop_range[:2]) / np.max(crop_range[:2])
+ xz_aspect = np.min(crop_range[[0, 2]]) / np.max(crop_range[[0, 2]])
+ yz_aspect = np.min(crop_range[1:]) / np.max(crop_range[1:])
+ return (
+ (xy_aspect >= aspect_min)
+ or (xz_aspect >= aspect_min)
+ or (yz_aspect >= aspect_min)
+ )
+
+
+class RandomCuboid(object):
+ """
+ RandomCuboid augmentation from DepthContrast [https://arxiv.org/abs/2101.02691]
+ We slightly modify this operation to account for object detection.
+ This augmentation randomly crops a cuboid from the input and
+ ensures that the cropped cuboid contains at least one bounding box
+ """
+
+ def __init__(
+ self,
+ min_points,
+ # aspect=0.8,
+ crop_length=6.0,
+ version1=True,
+ ):
+ # self.aspect = aspect
+ self.crop_length = crop_length
+ self.min_points = min_points
+ self.version1 = version1
+
+ def __call__(self, point_cloud):
+ if point_cloud.shape[0] < self.min_points:
+ print("too small pcd")
+ return np.ones(point_cloud.shape[0], dtype=np.bool)
+
+ range_xyz = np.max(point_cloud[:, :2], axis=0) - np.min(
+ point_cloud[:, :2], axis=0
+ )
+
+ for _ in range(100):
+ # crop_range = self.min_crop + np.random.rand(3) * (
+ # self.max_crop - self.min_crop
+ # )
+ # crop_range[-1] = 999.
+ # if not check_aspect(crop_range, self.aspect):
+ # continue
+
+ sample_center = point_cloud[:, :2].min(axis=0) + range_xyz / 2
+
+ if self.version1:
+ offset_x = np.random.uniform(
+ -range_xyz[0] / 4, range_xyz[0] / 4
+ )
+ offset_y = np.random.uniform(
+ -range_xyz[1] / 4, range_xyz[1] / 4
+ )
+ else:
+ offset_x = np.random.uniform(
+ -(range_xyz[0] / 2) + self.crop_length / 4,
+ +(range_xyz[0] / 2) - self.crop_length / 4,
+ )
+ offset_y = np.random.uniform(
+ -(range_xyz[1] / 2) + self.crop_length / 4,
+ +(range_xyz[1] / 2) - self.crop_length / 4,
+ )
+
+ sample_center[0] = sample_center[0] + offset_x
+ sample_center[1] = sample_center[1] + offset_y
+
+ min_xy = sample_center - self.crop_length / 2
+ max_xy = sample_center + self.crop_length / 2
+
+ upper_idx = (
+ np.sum((point_cloud[:, :2] <= max_xy).astype(np.int32), 1) == 2
+ )
+ lower_idx = (
+ np.sum((point_cloud[:, :2] >= min_xy).astype(np.int32), 1) == 2
+ )
+
+ new_pointidx = (upper_idx) & (lower_idx)
+
+ if np.sum(new_pointidx) < self.min_points:
+ print("TOO SMALL")
+ continue
+
+ return new_pointidx
+
+ # fallback
+ print("FALLBACK")
+ return np.ones(point_cloud.shape[0], dtype=np.bool)
diff --git a/models/Mask3D/mask3d/datasets/scannet200/__init__.py b/models/Mask3D/mask3d/datasets/scannet200/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/datasets/scannet200/scannet200_constants.py b/models/Mask3D/mask3d/datasets/scannet200/scannet200_constants.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d921407068335b82ad10af912d7e9d715dbd6ca
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/scannet200/scannet200_constants.py
@@ -0,0 +1,704 @@
+### ScanNet Benchmark constants ###
+VALID_CLASS_IDS_20 = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 14,
+ 16,
+ 24,
+ 28,
+ 33,
+ 34,
+ 36,
+ 39,
+)
+
+CLASS_LABELS_20 = (
+ "wall",
+ "floor",
+ "cabinet",
+ "bed",
+ "chair",
+ "sofa",
+ "table",
+ "door",
+ "window",
+ "bookshelf",
+ "picture",
+ "counter",
+ "desk",
+ "curtain",
+ "refrigerator",
+ "shower curtain",
+ "toilet",
+ "sink",
+ "bathtub",
+ "otherfurniture",
+)
+
+SCANNET_COLOR_MAP_20 = {
+ 0: (0.0, 0.0, 0.0),
+ 1: (174.0, 199.0, 232.0),
+ 2: (152.0, 223.0, 138.0),
+ 3: (31.0, 119.0, 180.0),
+ 4: (255.0, 187.0, 120.0),
+ 5: (188.0, 189.0, 34.0),
+ 6: (140.0, 86.0, 75.0),
+ 7: (255.0, 152.0, 150.0),
+ 8: (214.0, 39.0, 40.0),
+ 9: (197.0, 176.0, 213.0),
+ 10: (148.0, 103.0, 189.0),
+ 11: (196.0, 156.0, 148.0),
+ 12: (23.0, 190.0, 207.0),
+ 14: (247.0, 182.0, 210.0),
+ 15: (66.0, 188.0, 102.0),
+ 16: (219.0, 219.0, 141.0),
+ 17: (140.0, 57.0, 197.0),
+ 18: (202.0, 185.0, 52.0),
+ 19: (51.0, 176.0, 203.0),
+ 20: (200.0, 54.0, 131.0),
+ 21: (92.0, 193.0, 61.0),
+ 22: (78.0, 71.0, 183.0),
+ 23: (172.0, 114.0, 82.0),
+ 24: (255.0, 127.0, 14.0),
+ 25: (91.0, 163.0, 138.0),
+ 26: (153.0, 98.0, 156.0),
+ 27: (140.0, 153.0, 101.0),
+ 28: (158.0, 218.0, 229.0),
+ 29: (100.0, 125.0, 154.0),
+ 30: (178.0, 127.0, 135.0),
+ 32: (146.0, 111.0, 194.0),
+ 33: (44.0, 160.0, 44.0),
+ 34: (112.0, 128.0, 144.0),
+ 35: (96.0, 207.0, 209.0),
+ 36: (227.0, 119.0, 194.0),
+ 37: (213.0, 92.0, 176.0),
+ 38: (94.0, 106.0, 211.0),
+ 39: (82.0, 84.0, 163.0),
+ 40: (100.0, 85.0, 144.0),
+}
+
+### ScanNet200 Benchmark constants ###
+VALID_CLASS_IDS_200 = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 121,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 221,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 286,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 331,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 399,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 572,
+ 581,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1174,
+ 1175,
+ 1176,
+ 1178,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1183,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1190,
+ 1191,
+)
+
+CLASS_LABELS_200 = (
+ "wall",
+ "chair",
+ "floor",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "bicycle",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "guitar case",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "structure",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "storage organizer",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "luggage",
+ "mattress",
+)
+
+SCANNET_COLOR_MAP_200 = {
+ 0: (0.0, 0.0, 0.0),
+ 1: (174.0, 199.0, 232.0),
+ 2: (188.0, 189.0, 34.0),
+ 3: (152.0, 223.0, 138.0),
+ 4: (255.0, 152.0, 150.0),
+ 5: (214.0, 39.0, 40.0),
+ 6: (91.0, 135.0, 229.0),
+ 7: (31.0, 119.0, 180.0),
+ 8: (229.0, 91.0, 104.0),
+ 9: (247.0, 182.0, 210.0),
+ 10: (91.0, 229.0, 110.0),
+ 11: (255.0, 187.0, 120.0),
+ 13: (141.0, 91.0, 229.0),
+ 14: (112.0, 128.0, 144.0),
+ 15: (196.0, 156.0, 148.0),
+ 16: (197.0, 176.0, 213.0),
+ 17: (44.0, 160.0, 44.0),
+ 18: (148.0, 103.0, 189.0),
+ 19: (229.0, 91.0, 223.0),
+ 21: (219.0, 219.0, 141.0),
+ 22: (192.0, 229.0, 91.0),
+ 23: (88.0, 218.0, 137.0),
+ 24: (58.0, 98.0, 137.0),
+ 26: (177.0, 82.0, 239.0),
+ 27: (255.0, 127.0, 14.0),
+ 28: (237.0, 204.0, 37.0),
+ 29: (41.0, 206.0, 32.0),
+ 31: (62.0, 143.0, 148.0),
+ 32: (34.0, 14.0, 130.0),
+ 33: (143.0, 45.0, 115.0),
+ 34: (137.0, 63.0, 14.0),
+ 35: (23.0, 190.0, 207.0),
+ 36: (16.0, 212.0, 139.0),
+ 38: (90.0, 119.0, 201.0),
+ 39: (125.0, 30.0, 141.0),
+ 40: (150.0, 53.0, 56.0),
+ 41: (186.0, 197.0, 62.0),
+ 42: (227.0, 119.0, 194.0),
+ 44: (38.0, 100.0, 128.0),
+ 45: (120.0, 31.0, 243.0),
+ 46: (154.0, 59.0, 103.0),
+ 47: (169.0, 137.0, 78.0),
+ 48: (143.0, 245.0, 111.0),
+ 49: (37.0, 230.0, 205.0),
+ 50: (14.0, 16.0, 155.0),
+ 51: (196.0, 51.0, 182.0),
+ 52: (237.0, 80.0, 38.0),
+ 54: (138.0, 175.0, 62.0),
+ 55: (158.0, 218.0, 229.0),
+ 56: (38.0, 96.0, 167.0),
+ 57: (190.0, 77.0, 246.0),
+ 58: (208.0, 49.0, 84.0),
+ 59: (208.0, 193.0, 72.0),
+ 62: (55.0, 220.0, 57.0),
+ 63: (10.0, 125.0, 140.0),
+ 64: (76.0, 38.0, 202.0),
+ 65: (191.0, 28.0, 135.0),
+ 66: (211.0, 120.0, 42.0),
+ 67: (118.0, 174.0, 76.0),
+ 68: (17.0, 242.0, 171.0),
+ 69: (20.0, 65.0, 247.0),
+ 70: (208.0, 61.0, 222.0),
+ 71: (162.0, 62.0, 60.0),
+ 72: (210.0, 235.0, 62.0),
+ 73: (45.0, 152.0, 72.0),
+ 74: (35.0, 107.0, 149.0),
+ 75: (160.0, 89.0, 237.0),
+ 76: (227.0, 56.0, 125.0),
+ 77: (169.0, 143.0, 81.0),
+ 78: (42.0, 143.0, 20.0),
+ 79: (25.0, 160.0, 151.0),
+ 80: (82.0, 75.0, 227.0),
+ 82: (253.0, 59.0, 222.0),
+ 84: (240.0, 130.0, 89.0),
+ 86: (123.0, 172.0, 47.0),
+ 87: (71.0, 194.0, 133.0),
+ 88: (24.0, 94.0, 205.0),
+ 89: (134.0, 16.0, 179.0),
+ 90: (159.0, 32.0, 52.0),
+ 93: (213.0, 208.0, 88.0),
+ 95: (64.0, 158.0, 70.0),
+ 96: (18.0, 163.0, 194.0),
+ 97: (65.0, 29.0, 153.0),
+ 98: (177.0, 10.0, 109.0),
+ 99: (152.0, 83.0, 7.0),
+ 100: (83.0, 175.0, 30.0),
+ 101: (18.0, 199.0, 153.0),
+ 102: (61.0, 81.0, 208.0),
+ 103: (213.0, 85.0, 216.0),
+ 104: (170.0, 53.0, 42.0),
+ 105: (161.0, 192.0, 38.0),
+ 106: (23.0, 241.0, 91.0),
+ 107: (12.0, 103.0, 170.0),
+ 110: (151.0, 41.0, 245.0),
+ 112: (133.0, 51.0, 80.0),
+ 115: (184.0, 162.0, 91.0),
+ 116: (50.0, 138.0, 38.0),
+ 118: (31.0, 237.0, 236.0),
+ 120: (39.0, 19.0, 208.0),
+ 121: (223.0, 27.0, 180.0),
+ 122: (254.0, 141.0, 85.0),
+ 125: (97.0, 144.0, 39.0),
+ 128: (106.0, 231.0, 176.0),
+ 130: (12.0, 61.0, 162.0),
+ 131: (124.0, 66.0, 140.0),
+ 132: (137.0, 66.0, 73.0),
+ 134: (250.0, 253.0, 26.0),
+ 136: (55.0, 191.0, 73.0),
+ 138: (60.0, 126.0, 146.0),
+ 139: (153.0, 108.0, 234.0),
+ 140: (184.0, 58.0, 125.0),
+ 141: (135.0, 84.0, 14.0),
+ 145: (139.0, 248.0, 91.0),
+ 148: (53.0, 200.0, 172.0),
+ 154: (63.0, 69.0, 134.0),
+ 155: (190.0, 75.0, 186.0),
+ 156: (127.0, 63.0, 52.0),
+ 157: (141.0, 182.0, 25.0),
+ 159: (56.0, 144.0, 89.0),
+ 161: (64.0, 160.0, 250.0),
+ 163: (182.0, 86.0, 245.0),
+ 165: (139.0, 18.0, 53.0),
+ 166: (134.0, 120.0, 54.0),
+ 168: (49.0, 165.0, 42.0),
+ 169: (51.0, 128.0, 133.0),
+ 170: (44.0, 21.0, 163.0),
+ 177: (232.0, 93.0, 193.0),
+ 180: (176.0, 102.0, 54.0),
+ 185: (116.0, 217.0, 17.0),
+ 188: (54.0, 209.0, 150.0),
+ 191: (60.0, 99.0, 204.0),
+ 193: (129.0, 43.0, 144.0),
+ 195: (252.0, 100.0, 106.0),
+ 202: (187.0, 196.0, 73.0),
+ 208: (13.0, 158.0, 40.0),
+ 213: (52.0, 122.0, 152.0),
+ 214: (128.0, 76.0, 202.0),
+ 221: (187.0, 50.0, 115.0),
+ 229: (180.0, 141.0, 71.0),
+ 230: (77.0, 208.0, 35.0),
+ 232: (72.0, 183.0, 168.0),
+ 233: (97.0, 99.0, 203.0),
+ 242: (172.0, 22.0, 158.0),
+ 250: (155.0, 64.0, 40.0),
+ 261: (118.0, 159.0, 30.0),
+ 264: (69.0, 252.0, 148.0),
+ 276: (45.0, 103.0, 173.0),
+ 283: (111.0, 38.0, 149.0),
+ 286: (184.0, 9.0, 49.0),
+ 300: (188.0, 174.0, 67.0),
+ 304: (53.0, 206.0, 53.0),
+ 312: (97.0, 235.0, 252.0),
+ 323: (66.0, 32.0, 182.0),
+ 325: (236.0, 114.0, 195.0),
+ 331: (241.0, 154.0, 83.0),
+ 342: (133.0, 240.0, 52.0),
+ 356: (16.0, 205.0, 144.0),
+ 370: (75.0, 101.0, 198.0),
+ 392: (237.0, 95.0, 251.0),
+ 395: (191.0, 52.0, 49.0),
+ 399: (227.0, 254.0, 54.0),
+ 408: (49.0, 206.0, 87.0),
+ 417: (48.0, 113.0, 150.0),
+ 488: (125.0, 73.0, 182.0),
+ 540: (229.0, 32.0, 114.0),
+ 562: (158.0, 119.0, 28.0),
+ 570: (60.0, 205.0, 27.0),
+ 572: (18.0, 215.0, 201.0),
+ 581: (79.0, 76.0, 153.0),
+ 609: (134.0, 13.0, 116.0),
+ 748: (192.0, 97.0, 63.0),
+ 776: (108.0, 163.0, 18.0),
+ 1156: (95.0, 220.0, 156.0),
+ 1163: (98.0, 141.0, 208.0),
+ 1164: (144.0, 19.0, 193.0),
+ 1165: (166.0, 36.0, 57.0),
+ 1166: (212.0, 202.0, 34.0),
+ 1167: (23.0, 206.0, 34.0),
+ 1168: (91.0, 211.0, 236.0),
+ 1169: (79.0, 55.0, 137.0),
+ 1170: (182.0, 19.0, 117.0),
+ 1171: (134.0, 76.0, 14.0),
+ 1172: (87.0, 185.0, 28.0),
+ 1173: (82.0, 224.0, 187.0),
+ 1174: (92.0, 110.0, 214.0),
+ 1175: (168.0, 80.0, 171.0),
+ 1176: (197.0, 63.0, 51.0),
+ 1178: (175.0, 199.0, 77.0),
+ 1179: (62.0, 180.0, 98.0),
+ 1180: (8.0, 91.0, 150.0),
+ 1181: (77.0, 15.0, 130.0),
+ 1182: (154.0, 65.0, 96.0),
+ 1183: (197.0, 152.0, 11.0),
+ 1184: (59.0, 155.0, 45.0),
+ 1185: (12.0, 147.0, 145.0),
+ 1186: (54.0, 35.0, 219.0),
+ 1187: (210.0, 73.0, 181.0),
+ 1188: (221.0, 124.0, 77.0),
+ 1189: (149.0, 214.0, 66.0),
+ 1190: (72.0, 185.0, 134.0),
+ 1191: (42.0, 94.0, 198.0),
+}
+
+### For instance segmentation the non-object categories ###
+VALID_PANOPTIC_IDS = (1, 3)
+
+CLASS_LABELS_PANOPTIC = ("wall", "floor")
diff --git a/models/Mask3D/mask3d/datasets/scannet200/scannet200_splits.py b/models/Mask3D/mask3d/datasets/scannet200/scannet200_splits.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a5585f70319d1eb061669bd82bbf3d64d0bca7b
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/scannet200/scannet200_splits.py
@@ -0,0 +1,625 @@
+### This file contains the HEAD - COMMON - TAIL split category ids for ScanNet 200
+
+HEAD_CATS_SCANNET_200 = [
+ "tv stand",
+ "curtain",
+ "blinds",
+ "shower curtain",
+ "bookshelf",
+ "tv",
+ "kitchen cabinet",
+ "pillow",
+ "lamp",
+ "dresser",
+ "monitor",
+ "object",
+ "ceiling",
+ "board",
+ "stove",
+ "closet wall",
+ "couch",
+ "office chair",
+ "kitchen counter",
+ "shower",
+ "closet",
+ "doorframe",
+ "sofa chair",
+ "mailbox",
+ "nightstand",
+ "washing machine",
+ "picture",
+ "book",
+ "sink",
+ "recycling bin",
+ "table",
+ "backpack",
+ "shower wall",
+ "toilet",
+ "copier",
+ "counter",
+ "stool",
+ "refrigerator",
+ "window",
+ "file cabinet",
+ "chair",
+ "wall",
+ "plant",
+ "coffee table",
+ "stairs",
+ "armchair",
+ "cabinet",
+ "bathroom vanity",
+ "bathroom stall",
+ "mirror",
+ "blackboard",
+ "trash can",
+ "stair rail",
+ "box",
+ "towel",
+ "door",
+ "clothes",
+ "whiteboard",
+ "bed",
+ "floor",
+ "bathtub",
+ "desk",
+ "wardrobe",
+ "clothes dryer",
+ "radiator",
+ "shelf",
+]
+COMMON_CATS_SCANNET_200 = [
+ "cushion",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "toilet paper",
+ "printer",
+ "blanket",
+ "microwave",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "basket",
+ "fan",
+ "laptop",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "rack",
+ "piano",
+ "suitcase",
+ "rail",
+ "container",
+ "telephone",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "seat",
+ "column",
+ "bicycle",
+ "ladder",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "machine",
+ "mat",
+ "windowsill",
+ "bulletin board",
+ "fireplace",
+ "mini fridge",
+ "water cooler",
+ "shower door",
+ "pillar",
+ "ledge",
+ "furniture",
+ "cart",
+ "decoration",
+ "closet door",
+ "vacuum cleaner",
+ "dish rack",
+ "range hood",
+ "projector screen",
+ "divider",
+ "bathroom counter",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "bathroom cabinet",
+ "structure",
+ "storage organizer",
+ "potted plant",
+ "mattress",
+]
+TAIL_CATS_SCANNET_200 = [
+ "paper",
+ "plate",
+ "soap dispenser",
+ "bucket",
+ "clock",
+ "guitar",
+ "toilet paper holder",
+ "speaker",
+ "cup",
+ "paper towel roll",
+ "bar",
+ "toaster",
+ "ironing board",
+ "soap dish",
+ "toilet paper dispenser",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "paper cutter",
+ "tray",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "storage container",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "sign",
+ "projector",
+ "candle",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "broom",
+ "guitar case",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "purse",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "bowl",
+ "paper bag",
+ "alarm clock",
+ "music stand",
+ "laundry detergent",
+ "dumbbell",
+ "tube",
+ "cd case",
+ "closet rod",
+ "coffee kettle",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "luggage",
+]
+
+
+### Given the different size of the official train and val sets, not all ScanNet200 categories are present in the validation set.
+### Here we list of categories with labels and IDs present in both train and validation set, and the remaining categories those are present in train, but not in val
+### We dont evaluate on unseen validation categories in this benchmark
+
+VALID_CLASS_IDS_200_VALIDATION = (
+ "wall",
+ "chair",
+ "floor",
+ "table",
+ "door",
+ "couch",
+ "cabinet",
+ "shelf",
+ "desk",
+ "office chair",
+ "bed",
+ "pillow",
+ "sink",
+ "picture",
+ "window",
+ "toilet",
+ "bookshelf",
+ "monitor",
+ "curtain",
+ "book",
+ "armchair",
+ "coffee table",
+ "box",
+ "refrigerator",
+ "lamp",
+ "kitchen cabinet",
+ "towel",
+ "clothes",
+ "tv",
+ "nightstand",
+ "counter",
+ "dresser",
+ "stool",
+ "cushion",
+ "plant",
+ "ceiling",
+ "bathtub",
+ "end table",
+ "dining table",
+ "keyboard",
+ "bag",
+ "backpack",
+ "toilet paper",
+ "printer",
+ "tv stand",
+ "whiteboard",
+ "blanket",
+ "shower curtain",
+ "trash can",
+ "closet",
+ "stairs",
+ "microwave",
+ "stove",
+ "shoe",
+ "computer tower",
+ "bottle",
+ "bin",
+ "ottoman",
+ "bench",
+ "board",
+ "washing machine",
+ "mirror",
+ "copier",
+ "basket",
+ "sofa chair",
+ "file cabinet",
+ "fan",
+ "laptop",
+ "shower",
+ "paper",
+ "person",
+ "paper towel dispenser",
+ "oven",
+ "blinds",
+ "rack",
+ "plate",
+ "blackboard",
+ "piano",
+ "suitcase",
+ "rail",
+ "radiator",
+ "recycling bin",
+ "container",
+ "wardrobe",
+ "soap dispenser",
+ "telephone",
+ "bucket",
+ "clock",
+ "stand",
+ "light",
+ "laundry basket",
+ "pipe",
+ "clothes dryer",
+ "guitar",
+ "toilet paper holder",
+ "seat",
+ "speaker",
+ "column",
+ "ladder",
+ "bathroom stall",
+ "shower wall",
+ "cup",
+ "jacket",
+ "storage bin",
+ "coffee maker",
+ "dishwasher",
+ "paper towel roll",
+ "machine",
+ "mat",
+ "windowsill",
+ "bar",
+ "toaster",
+ "bulletin board",
+ "ironing board",
+ "fireplace",
+ "soap dish",
+ "kitchen counter",
+ "doorframe",
+ "toilet paper dispenser",
+ "mini fridge",
+ "fire extinguisher",
+ "ball",
+ "hat",
+ "shower curtain rod",
+ "water cooler",
+ "paper cutter",
+ "tray",
+ "shower door",
+ "pillar",
+ "ledge",
+ "toaster oven",
+ "mouse",
+ "toilet seat cover dispenser",
+ "furniture",
+ "cart",
+ "scale",
+ "tissue box",
+ "light switch",
+ "crate",
+ "power outlet",
+ "decoration",
+ "sign",
+ "projector",
+ "closet door",
+ "vacuum cleaner",
+ "plunger",
+ "stuffed animal",
+ "headphones",
+ "dish rack",
+ "broom",
+ "range hood",
+ "dustpan",
+ "hair dryer",
+ "water bottle",
+ "handicap bar",
+ "vent",
+ "shower floor",
+ "water pitcher",
+ "mailbox",
+ "bowl",
+ "paper bag",
+ "projector screen",
+ "divider",
+ "laundry detergent",
+ "bathroom counter",
+ "object",
+ "bathroom vanity",
+ "closet wall",
+ "laundry hamper",
+ "bathroom stall door",
+ "ceiling light",
+ "trash bin",
+ "dumbbell",
+ "stair rail",
+ "tube",
+ "bathroom cabinet",
+ "closet rod",
+ "coffee kettle",
+ "shower head",
+ "keyboard piano",
+ "case of water bottles",
+ "coat rack",
+ "folded chair",
+ "fire alarm",
+ "power strip",
+ "calendar",
+ "poster",
+ "potted plant",
+ "mattress",
+)
+
+CLASS_LABELS_200_VALIDATION = (
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 21,
+ 22,
+ 23,
+ 24,
+ 26,
+ 27,
+ 28,
+ 29,
+ 31,
+ 32,
+ 33,
+ 34,
+ 35,
+ 36,
+ 38,
+ 39,
+ 40,
+ 41,
+ 42,
+ 44,
+ 45,
+ 46,
+ 47,
+ 48,
+ 49,
+ 50,
+ 51,
+ 52,
+ 54,
+ 55,
+ 56,
+ 57,
+ 58,
+ 59,
+ 62,
+ 63,
+ 64,
+ 65,
+ 66,
+ 67,
+ 68,
+ 69,
+ 70,
+ 71,
+ 72,
+ 73,
+ 74,
+ 75,
+ 76,
+ 77,
+ 78,
+ 79,
+ 80,
+ 82,
+ 84,
+ 86,
+ 87,
+ 88,
+ 89,
+ 90,
+ 93,
+ 95,
+ 96,
+ 97,
+ 98,
+ 99,
+ 100,
+ 101,
+ 102,
+ 103,
+ 104,
+ 105,
+ 106,
+ 107,
+ 110,
+ 112,
+ 115,
+ 116,
+ 118,
+ 120,
+ 122,
+ 125,
+ 128,
+ 130,
+ 131,
+ 132,
+ 134,
+ 136,
+ 138,
+ 139,
+ 140,
+ 141,
+ 145,
+ 148,
+ 154,
+ 155,
+ 156,
+ 157,
+ 159,
+ 161,
+ 163,
+ 165,
+ 166,
+ 168,
+ 169,
+ 170,
+ 177,
+ 180,
+ 185,
+ 188,
+ 191,
+ 193,
+ 195,
+ 202,
+ 208,
+ 213,
+ 214,
+ 229,
+ 230,
+ 232,
+ 233,
+ 242,
+ 250,
+ 261,
+ 264,
+ 276,
+ 283,
+ 300,
+ 304,
+ 312,
+ 323,
+ 325,
+ 342,
+ 356,
+ 370,
+ 392,
+ 395,
+ 408,
+ 417,
+ 488,
+ 540,
+ 562,
+ 570,
+ 609,
+ 748,
+ 776,
+ 1156,
+ 1163,
+ 1164,
+ 1165,
+ 1166,
+ 1167,
+ 1168,
+ 1169,
+ 1170,
+ 1171,
+ 1172,
+ 1173,
+ 1175,
+ 1176,
+ 1179,
+ 1180,
+ 1181,
+ 1182,
+ 1184,
+ 1185,
+ 1186,
+ 1187,
+ 1188,
+ 1189,
+ 1191,
+)
+
+VALID_CLASS_IDS_200_TRAIN_ONLY = (
+ "bicycle",
+ "storage container",
+ "candle",
+ "guitar case",
+ "purse",
+ "alarm clock",
+ "music stand",
+ "cd case",
+ "structure",
+ "storage organizer",
+ "luggage",
+)
+
+CLASS_LABELS_200_TRAIN_ONLY = (
+ 121,
+ 221,
+ 286,
+ 331,
+ 399,
+ 572,
+ 581,
+ 1174,
+ 1178,
+ 1183,
+ 1190,
+)
diff --git a/models/Mask3D/mask3d/datasets/semseg.py b/models/Mask3D/mask3d/datasets/semseg.py
new file mode 100644
index 0000000000000000000000000000000000000000..a848b1a20e4690971bf16790fcea00ade84441c0
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/semseg.py
@@ -0,0 +1,993 @@
+import logging
+from itertools import product
+from pathlib import Path
+from random import random, sample, uniform
+from typing import List, Optional, Tuple, Union
+from random import choice
+from copy import deepcopy
+from random import randrange
+
+
+import numpy
+import torch
+from datasets.random_cuboid import RandomCuboid
+
+import albumentations as A
+import numpy as np
+import scipy
+import volumentations as V
+import yaml
+
+# from yaml import CLoader as Loader
+from torch.utils.data import Dataset
+from datasets.scannet200.scannet200_constants import (
+ SCANNET_COLOR_MAP_200,
+ SCANNET_COLOR_MAP_20,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class SemanticSegmentationDataset(Dataset):
+ """Docstring for SemanticSegmentationDataset."""
+
+ def __init__(
+ self,
+ dataset_name="scannet",
+ data_dir: Optional[Union[str, Tuple[str]]] = "data/processed/scannet",
+ label_db_filepath: Optional[
+ str
+ ] = "configs/scannet_preprocessing/label_database.yaml",
+ # mean std values from scannet
+ color_mean_std: Optional[Union[str, Tuple[Tuple[float]]]] = (
+ (0.47793125906962, 0.4303257521323044, 0.3749598901421883),
+ (0.2834475483823543, 0.27566157565723015, 0.27018971370874995),
+ ),
+ mode: Optional[str] = "train",
+ add_colors: Optional[bool] = True,
+ add_normals: Optional[bool] = True,
+ add_raw_coordinates: Optional[bool] = False,
+ add_instance: Optional[bool] = False,
+ num_labels: Optional[int] = -1,
+ data_percent: Optional[float] = 1.0,
+ ignore_label: Optional[Union[int, Tuple[int]]] = 255,
+ volume_augmentations_path: Optional[str] = None,
+ image_augmentations_path: Optional[str] = None,
+ instance_oversampling=0,
+ place_around_existing=False,
+ max_cut_region=0,
+ point_per_cut=100,
+ flip_in_center=False,
+ noise_rate=0.0,
+ resample_points=0.0,
+ cache_data=False,
+ add_unlabeled_pc=False,
+ task="instance_segmentation",
+ cropping=False,
+ cropping_args=None,
+ is_tta=False,
+ crop_min_size=20000,
+ crop_length=6.0,
+ cropping_v1=True,
+ reps_per_epoch=1,
+ area=-1,
+ on_crops=False,
+ eval_inner_core=-1,
+ filter_out_classes=[],
+ label_offset=0,
+ add_clip=False,
+ is_elastic_distortion=True,
+ color_drop=0.0,
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "unknown task"
+
+ self.add_clip = add_clip
+ self.dataset_name = dataset_name
+ self.is_elastic_distortion = is_elastic_distortion
+ self.color_drop = color_drop
+
+ if self.dataset_name == "scannet":
+ self.color_map = SCANNET_COLOR_MAP_20
+ self.color_map[255] = (255, 255, 255)
+ elif self.dataset_name == "stpls3d":
+ self.color_map = {
+ 0: [0, 255, 0], # Ground
+ 1: [0, 0, 255], # Build
+ 2: [0, 255, 255], # LowVeg
+ 3: [255, 255, 0], # MediumVeg
+ 4: [255, 0, 255], # HiVeg
+ 5: [100, 100, 255], # Vehicle
+ 6: [200, 200, 100], # Truck
+ 7: [170, 120, 200], # Aircraft
+ 8: [255, 0, 0], # MilitaryVec
+ 9: [200, 100, 100], # Bike
+ 10: [10, 200, 100], # Motorcycle
+ 11: [200, 200, 200], # LightPole
+ 12: [50, 50, 50], # StreetSign
+ 13: [60, 130, 60], # Clutter
+ 14: [130, 30, 60],
+ } # Fence
+ elif self.dataset_name == "scannet200":
+ self.color_map = SCANNET_COLOR_MAP_200
+ elif self.dataset_name == "s3dis":
+ self.color_map = {
+ 0: [0, 255, 0], # ceiling
+ 1: [0, 0, 255], # floor
+ 2: [0, 255, 255], # wall
+ 3: [255, 255, 0], # beam
+ 4: [255, 0, 255], # column
+ 5: [100, 100, 255], # window
+ 6: [200, 200, 100], # door
+ 7: [170, 120, 200], # table
+ 8: [255, 0, 0], # chair
+ 9: [200, 100, 100], # sofa
+ 10: [10, 200, 100], # bookcase
+ 11: [200, 200, 200], # board
+ 12: [50, 50, 50], # clutter
+ }
+ else:
+ assert False, "dataset not known"
+
+ self.task = task
+
+ self.filter_out_classes = filter_out_classes
+ self.label_offset = label_offset
+
+ self.area = area
+ self.eval_inner_core = eval_inner_core
+
+ self.reps_per_epoch = reps_per_epoch
+
+ self.cropping = cropping
+ self.cropping_args = cropping_args
+ self.is_tta = is_tta
+ self.on_crops = on_crops
+
+ self.crop_min_size = crop_min_size
+ self.crop_length = crop_length
+
+ self.version1 = cropping_v1
+
+ self.random_cuboid = RandomCuboid(
+ self.crop_min_size,
+ crop_length=self.crop_length,
+ version1=self.version1,
+ )
+
+ self.mode = mode
+ self.data_dir = data_dir
+ self.add_unlabeled_pc = add_unlabeled_pc
+ if add_unlabeled_pc:
+ self.other_database = self._load_yaml(
+ Path(data_dir).parent / "matterport" / "train_database.yaml"
+ )
+ if type(data_dir) == str:
+ self.data_dir = [self.data_dir]
+ self.ignore_label = ignore_label
+ self.add_colors = add_colors
+ self.add_normals = add_normals
+ self.add_instance = add_instance
+ self.add_raw_coordinates = add_raw_coordinates
+ self.instance_oversampling = instance_oversampling
+ self.place_around_existing = place_around_existing
+ self.max_cut_region = max_cut_region
+ self.point_per_cut = point_per_cut
+ self.flip_in_center = flip_in_center
+ self.noise_rate = noise_rate
+ self.resample_points = resample_points
+
+ # loading database files
+ self._data = []
+ for database_path in self.data_dir:
+ database_path = Path(database_path)
+ mode = 'Validation'
+ if self.dataset_name != "s3dis":
+ if not (database_path / f"{mode}_database.yaml").exists():
+ print(
+ f"generate {database_path}/{mode}_database.yaml first"
+ )
+ exit()
+ self._data.extend(
+ self._load_yaml(database_path / f"{mode}_database.yaml")
+ )
+ else:
+ # mode_s3dis = f"Area_{self.area}"
+ mode_s3dis = "Validation"
+ if self.mode == "train":
+ mode_s3dis = "train_" + mode_s3dis
+ if not (
+ database_path / f"{mode_s3dis}_database.yaml"
+ ).exists():
+ print(
+ f"generate {database_path}/{mode_s3dis}_database.yaml first"
+ )
+ exit()
+ self._data.extend(
+ self._load_yaml(
+ database_path / f"{mode_s3dis}_database.yaml"
+ )
+ )
+ if data_percent < 1.0:
+ self._data = sample(
+ self._data, int(len(self._data) * data_percent)
+ )
+ # labels = self._load_yaml(Path(label_db_filepath))
+
+ # if working only on classes for validation - discard others
+ # self._labels = self._select_correct_labels(labels, num_labels)
+
+ if instance_oversampling > 0:
+ self.instance_data = self._load_yaml(
+ Path(label_db_filepath).parent / "instance_database.yaml"
+ )
+
+ # normalize color channels
+ if self.dataset_name == "s3dis":
+ color_mean_std = color_mean_std.replace(
+ "color_mean_std.yaml", f"Area_{self.area}_color_mean_std.yaml"
+ )
+
+ if Path(str(color_mean_std)).exists():
+ color_mean_std = self._load_yaml(color_mean_std)
+ color_mean, color_std = (
+ tuple(color_mean_std["mean"]),
+ tuple(color_mean_std["std"]),
+ )
+ elif len(color_mean_std[0]) == 3 and len(color_mean_std[1]) == 3:
+ color_mean, color_std = color_mean_std[0], color_mean_std[1]
+ else:
+ logger.error(
+ "pass mean and std as tuple of tuples, or as an .yaml file"
+ )
+
+ # augmentations
+ self.volume_augmentations = V.NoOp()
+ if (volume_augmentations_path is not None) and (
+ volume_augmentations_path != "none"
+ ):
+ self.volume_augmentations = V.load(
+ Path(volume_augmentations_path), data_format="yaml"
+ )
+ self.image_augmentations = A.NoOp()
+ if (image_augmentations_path is not None) and (
+ image_augmentations_path != "none"
+ ):
+ self.image_augmentations = A.load(
+ Path(image_augmentations_path), data_format="yaml"
+ )
+ # mandatory color augmentation
+ if add_colors:
+ self.normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+ self.cache_data = cache_data
+ # new_data = []
+ if self.cache_data:
+ new_data = []
+ for i in range(len(self._data)):
+ self._data[i]["data"] = np.load(
+ self.data[i]["filepath"].replace("../../", "")
+ )
+ if self.on_crops:
+ if self.eval_inner_core == -1:
+ for block_id, block in enumerate(
+ self.splitPointCloud(self._data[i]["data"])
+ ):
+ if len(block) > 10000:
+ new_data.append(
+ {
+ "instance_gt_filepath": self._data[i][
+ "instance_gt_filepath"
+ ][block_id]
+ if len(
+ self._data[i][
+ "instance_gt_filepath"
+ ]
+ )
+ > 0
+ else list(),
+ "scene": f"{self._data[i]['scene'].replace('.txt', '')}_{block_id}.txt",
+ "raw_filepath": f"{self.data[i]['filepath'].replace('.npy', '')}_{block_id}",
+ "data": block,
+ }
+ )
+ else:
+ assert False
+ else:
+ conds_inner, blocks_outer = self.splitPointCloud(
+ self._data[i]["data"],
+ size=self.crop_length,
+ inner_core=self.eval_inner_core,
+ )
+
+ for block_id in range(len(conds_inner)):
+ cond_inner = conds_inner[block_id]
+ block_outer = blocks_outer[block_id]
+
+ if cond_inner.sum() > 10000:
+ new_data.append(
+ {
+ "instance_gt_filepath": self._data[i][
+ "instance_gt_filepath"
+ ][block_id]
+ if len(
+ self._data[i][
+ "instance_gt_filepath"
+ ]
+ )
+ > 0
+ else list(),
+ "scene": f"{self._data[i]['scene'].replace('.txt', '')}_{block_id}.txt",
+ "raw_filepath": f"{self.data[i]['filepath'].replace('.npy', '')}_{block_id}",
+ "data": block_outer,
+ "cond_inner": cond_inner,
+ }
+ )
+ else:
+ assert False
+
+ if self.on_crops:
+ self._data = new_data
+ # new_data.append(np.load(self.data[i]["filepath"].replace("../../", "")))
+ # self._data = new_data
+
+ def splitPointCloud(self, cloud, size=50.0, stride=50, inner_core=-1):
+ if inner_core == -1:
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - size) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - size) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks = []
+ for (x, y) in cells:
+ xcond = (cloud[:, 0] <= x + size) & (cloud[:, 0] >= x)
+ ycond = (cloud[:, 1] <= y + size) & (cloud[:, 1] >= y)
+ cond = xcond & ycond
+ block = cloud[cond, :]
+ blocks.append(block)
+ return blocks
+ else:
+ limitMax = np.amax(cloud[:, 0:3], axis=0)
+ width = int(np.ceil((limitMax[0] - inner_core) / stride)) + 1
+ depth = int(np.ceil((limitMax[1] - inner_core) / stride)) + 1
+ cells = [
+ (x * stride, y * stride)
+ for x in range(width)
+ for y in range(depth)
+ ]
+ blocks_outer = []
+ conds_inner = []
+ for (x, y) in cells:
+ xcond_outer = (
+ cloud[:, 0] <= x + inner_core / 2.0 + size / 2
+ ) & (cloud[:, 0] >= x + inner_core / 2.0 - size / 2)
+ ycond_outer = (
+ cloud[:, 1] <= y + inner_core / 2.0 + size / 2
+ ) & (cloud[:, 1] >= y + inner_core / 2.0 - size / 2)
+
+ cond_outer = xcond_outer & ycond_outer
+ block_outer = cloud[cond_outer, :]
+
+ xcond_inner = (block_outer[:, 0] <= x + inner_core) & (
+ block_outer[:, 0] >= x
+ )
+ ycond_inner = (block_outer[:, 1] <= y + inner_core) & (
+ block_outer[:, 1] >= y
+ )
+
+ cond_inner = xcond_inner & ycond_inner
+
+ conds_inner.append(cond_inner)
+ blocks_outer.append(block_outer)
+ return conds_inner, blocks_outer
+
+ def map2color(self, labels):
+ output_colors = list()
+
+ for label in labels:
+ output_colors.append(self.color_map[label])
+
+ return torch.tensor(output_colors)
+
+ def __len__(self):
+ if self.is_tta:
+ return 5 * len(self.data)
+ else:
+ return self.reps_per_epoch * len(self.data)
+
+ def __getitem__(self, idx: int):
+ idx = idx % len(self.data)
+ if self.is_tta:
+ idx = idx % len(self.data)
+
+ if self.cache_data:
+ points = self.data[idx]["data"]
+ else:
+ assert not self.on_crops, "you need caching if on crops"
+ points = np.load(self.data[idx]["filepath"].replace("../../", ""))
+
+ if "train" in self.mode and self.dataset_name in ["s3dis", "stpls3d"]:
+ inds = self.random_cuboid(points)
+ points = points[inds]
+
+ coordinates, color, normals, segments, labels = (
+ points[:, :3],
+ points[:, 3:6],
+ points[:, 6:9],
+ points[:, 9],
+ points[:, 10:12],
+ )
+
+ raw_coordinates = coordinates.copy()
+ raw_color = color
+ raw_normals = normals
+
+ if not self.add_colors:
+ color = np.ones((len(color), 3))
+
+ # volume and image augmentations for train
+ if "train" in self.mode or self.is_tta:
+ if self.cropping:
+ new_idx = self.random_cuboid(
+ coordinates,
+ labels[:, 1],
+ self._remap_from_zero(labels[:, 0].copy()),
+ )
+
+ coordinates = coordinates[new_idx]
+ color = color[new_idx]
+ labels = labels[new_idx]
+ segments = segments[new_idx]
+ raw_color = raw_color[new_idx]
+ raw_normals = raw_normals[new_idx]
+ normals = normals[new_idx]
+ points = points[new_idx]
+
+ coordinates -= coordinates.mean(0)
+
+ try:
+ coordinates += (
+ np.random.uniform(coordinates.min(0), coordinates.max(0))
+ / 2
+ )
+ except OverflowError as err:
+ print(coordinates)
+ print(coordinates.shape)
+ raise err
+
+ if self.instance_oversampling > 0.0:
+ (
+ coordinates,
+ color,
+ normals,
+ labels,
+ ) = self.augment_individual_instance(
+ coordinates,
+ color,
+ normals,
+ labels,
+ self.instance_oversampling,
+ )
+
+ if self.flip_in_center:
+ coordinates = flip_in_center(coordinates)
+
+ for i in (0, 1):
+ if random() < 0.5:
+ coord_max = np.max(points[:, i])
+ coordinates[:, i] = coord_max - coordinates[:, i]
+
+ if random() < 0.95:
+ if self.is_elastic_distortion:
+ for granularity, magnitude in ((0.2, 0.4), (0.8, 1.6)):
+ coordinates = elastic_distortion(
+ coordinates, granularity, magnitude
+ )
+ aug = self.volume_augmentations(
+ points=coordinates,
+ normals=normals,
+ features=color,
+ labels=labels,
+ )
+ coordinates, color, normals, labels = (
+ aug["points"],
+ aug["features"],
+ aug["normals"],
+ aug["labels"],
+ )
+ pseudo_image = color.astype(np.uint8)[np.newaxis, :, :]
+ color = np.squeeze(
+ self.image_augmentations(image=pseudo_image)["image"]
+ )
+
+ if self.point_per_cut != 0:
+ number_of_cuts = int(len(coordinates) / self.point_per_cut)
+ for _ in range(number_of_cuts):
+ size_of_cut = np.random.uniform(0.05, self.max_cut_region)
+ # not wall, floor or empty
+ point = choice(coordinates)
+ x_min = point[0] - size_of_cut
+ x_max = x_min + size_of_cut
+ y_min = point[1] - size_of_cut
+ y_max = y_min + size_of_cut
+ z_min = point[2] - size_of_cut
+ z_max = z_min + size_of_cut
+ indexes = crop(
+ coordinates, x_min, y_min, z_min, x_max, y_max, z_max
+ )
+ coordinates, normals, color, labels = (
+ coordinates[~indexes],
+ normals[~indexes],
+ color[~indexes],
+ labels[~indexes],
+ )
+
+ # if self.noise_rate > 0:
+ # coordinates, color, normals, labels = random_points(
+ # coordinates,
+ # color,
+ # normals,
+ # labels,
+ # self.noise_rate,
+ # self.ignore_label,
+ # )
+
+ if (self.resample_points > 0) or (self.noise_rate > 0):
+ coordinates, color, normals, labels = random_around_points(
+ coordinates,
+ color,
+ normals,
+ labels,
+ self.resample_points,
+ self.noise_rate,
+ self.ignore_label,
+ )
+
+ if self.add_unlabeled_pc:
+ if random() < 0.8:
+ new_points = np.load(
+ self.other_database[
+ np.random.randint(0, len(self.other_database) - 1)
+ ]["filepath"]
+ )
+ (
+ unlabeled_coords,
+ unlabeled_color,
+ unlabeled_normals,
+ unlabeled_labels,
+ ) = (
+ new_points[:, :3],
+ new_points[:, 3:6],
+ new_points[:, 6:9],
+ new_points[:, 9:],
+ )
+ unlabeled_coords -= unlabeled_coords.mean(0)
+ unlabeled_coords += (
+ np.random.uniform(
+ unlabeled_coords.min(0), unlabeled_coords.max(0)
+ )
+ / 2
+ )
+
+ aug = self.volume_augmentations(
+ points=unlabeled_coords,
+ normals=unlabeled_normals,
+ features=unlabeled_color,
+ labels=unlabeled_labels,
+ )
+ (
+ unlabeled_coords,
+ unlabeled_color,
+ unlabeled_normals,
+ unlabeled_labels,
+ ) = (
+ aug["points"],
+ aug["features"],
+ aug["normals"],
+ aug["labels"],
+ )
+ pseudo_image = unlabeled_color.astype(np.uint8)[
+ np.newaxis, :, :
+ ]
+ unlabeled_color = np.squeeze(
+ self.image_augmentations(image=pseudo_image)["image"]
+ )
+
+ coordinates = np.concatenate(
+ (coordinates, unlabeled_coords)
+ )
+ color = np.concatenate((color, unlabeled_color))
+ normals = np.concatenate((normals, unlabeled_normals))
+ labels = np.concatenate(
+ (
+ labels,
+ np.full_like(unlabeled_labels, self.ignore_label),
+ )
+ )
+
+ if random() < self.color_drop:
+ color[:] = 255
+
+ # normalize color information
+ pseudo_image = color.astype(np.uint8)[np.newaxis, :, :]
+ color = np.squeeze(self.normalize_color(image=pseudo_image)["image"])
+
+ # prepare labels and map from 0 to 20(40)
+ labels = labels.astype(np.int32)
+ # if labels.size > 0:
+ # labels[:, 0] = self._remap_from_zero(labels[:, 0])
+ # if not self.add_instance:
+ # # taking only first column, which is segmentation label, not instance
+ # labels = labels[:, 0].flatten()[..., None]
+
+ labels = np.hstack((labels, segments[..., None].astype(np.int32)))
+
+ features = color
+ if self.add_normals:
+ features = np.hstack((features, normals))
+ if self.add_raw_coordinates:
+ if len(features.shape) == 1:
+ features = np.hstack((features[None, ...], coordinates))
+ else:
+ features = np.hstack((features, coordinates))
+
+ # if self.task != "semantic_segmentation":
+ if self.data[idx]["raw_filepath"].split("/")[-2] in [
+ "scene0636_00",
+ "scene0154_00",
+ ]:
+ return self.__getitem__(0)
+
+ if self.dataset_name == "s3dis":
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["area"] + "_" + self.data[idx]["scene"],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+ if self.dataset_name == "stpls3d":
+ if labels.shape[1] != 1: # only segments --> test set!
+ if np.unique(labels[:, -2]).shape[0] < 2:
+ print("NO INSTANCES")
+ return self.__getitem__(0)
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["scene"],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+ else:
+ return (
+ coordinates,
+ features,
+ labels,
+ self.data[idx]["raw_filepath"].split("/")[-2],
+ raw_color,
+ raw_normals,
+ raw_coordinates,
+ idx,
+ )
+
+ @property
+ def data(self):
+ """database file containing information about preproscessed dataset"""
+ return self._data
+
+ @property
+ def label_info(self):
+ """database file containing information labels used by dataset"""
+ return self._labels
+
+ @staticmethod
+ def _load_yaml(filepath):
+ with open(filepath) as f:
+ # file = yaml.load(f, Loader=Loader)
+ file = yaml.load(f)
+ return file
+
+ def _select_correct_labels(self, labels, num_labels):
+ number_of_validation_labels = 0
+ number_of_all_labels = 0
+ for (
+ k,
+ v,
+ ) in labels.items():
+ number_of_all_labels += 1
+ if v["validation"]:
+ number_of_validation_labels += 1
+
+ if num_labels == number_of_all_labels:
+ return labels
+ elif num_labels == number_of_validation_labels:
+ valid_labels = dict()
+ for (
+ k,
+ v,
+ ) in labels.items():
+ if v["validation"]:
+ valid_labels.update({k: v})
+ return valid_labels
+ else:
+ msg = f"""not available number labels, select from:
+ {number_of_validation_labels}, {number_of_all_labels}"""
+ raise ValueError(msg)
+
+ def _remap_from_zero(self, labels):
+ labels[
+ ~np.isin(labels, list(self.label_info.keys()))
+ ] = self.ignore_label
+ # remap to the range from 0
+ for i, k in enumerate(self.label_info.keys()):
+ labels[labels == k] = i
+ return labels
+
+ def _remap_model_output(self, output):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(self.label_info.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
+
+ def augment_individual_instance(
+ self, coordinates, color, normals, labels, oversampling=1.0
+ ):
+ max_instance = int(len(np.unique(labels[:, 1])))
+ # randomly selecting half of non-zero instances
+ for instance in range(0, int(max_instance * oversampling)):
+ if self.place_around_existing:
+ center = choice(
+ coordinates[
+ labels[:, 1] == choice(np.unique(labels[:, 1]))
+ ]
+ )
+ else:
+ center = np.array(
+ [uniform(-5, 5), uniform(-5, 5), uniform(-0.5, 2)]
+ )
+ instance = choice(choice(self.instance_data))
+ instance = np.load(instance["instance_filepath"])
+ # centering two objects
+ instance[:, :3] = (
+ instance[:, :3] - instance[:, :3].mean(axis=0) + center
+ )
+ max_instance = max_instance + 1
+ instance[:, -1] = max_instance
+ aug = V.Compose(
+ [
+ V.Scale3d(),
+ V.RotateAroundAxis3d(
+ rotation_limit=np.pi / 24, axis=(1, 0, 0)
+ ),
+ V.RotateAroundAxis3d(
+ rotation_limit=np.pi / 24, axis=(0, 1, 0)
+ ),
+ V.RotateAroundAxis3d(rotation_limit=np.pi, axis=(0, 0, 1)),
+ ]
+ )(
+ points=instance[:, :3],
+ features=instance[:, 3:6],
+ normals=instance[:, 6:9],
+ labels=instance[:, 9:],
+ )
+ coordinates = np.concatenate((coordinates, aug["points"]))
+ color = np.concatenate((color, aug["features"]))
+ normals = np.concatenate((normals, aug["normals"]))
+ labels = np.concatenate((labels, aug["labels"]))
+
+ return coordinates, color, normals, labels
+
+
+def elastic_distortion(pointcloud, granularity, magnitude):
+ """Apply elastic distortion on sparse coordinate space.
+
+ pointcloud: numpy array of (number of points, at least 3 spatial dims)
+ granularity: size of the noise grid (in same scale[m/cm] as the voxel grid)
+ magnitude: noise multiplier
+ """
+ blurx = np.ones((3, 1, 1, 1)).astype("float32") / 3
+ blury = np.ones((1, 3, 1, 1)).astype("float32") / 3
+ blurz = np.ones((1, 1, 3, 1)).astype("float32") / 3
+ coords = pointcloud[:, :3]
+ coords_min = coords.min(0)
+
+ # Create Gaussian noise tensor of the size given by granularity.
+ noise_dim = ((coords - coords_min).max(0) // granularity).astype(int) + 3
+ noise = np.random.randn(*noise_dim, 3).astype(np.float32)
+
+ # Smoothing.
+ for _ in range(2):
+ noise = scipy.ndimage.filters.convolve(
+ noise, blurx, mode="constant", cval=0
+ )
+ noise = scipy.ndimage.filters.convolve(
+ noise, blury, mode="constant", cval=0
+ )
+ noise = scipy.ndimage.filters.convolve(
+ noise, blurz, mode="constant", cval=0
+ )
+
+ # Trilinear interpolate noise filters for each spatial dimensions.
+ ax = [
+ np.linspace(d_min, d_max, d)
+ for d_min, d_max, d in zip(
+ coords_min - granularity,
+ coords_min + granularity * (noise_dim - 2),
+ noise_dim,
+ )
+ ]
+ interp = scipy.interpolate.RegularGridInterpolator(
+ ax, noise, bounds_error=0, fill_value=0
+ )
+ pointcloud[:, :3] = coords + interp(coords) * magnitude
+ return pointcloud
+
+
+def crop(points, x_min, y_min, z_min, x_max, y_max, z_max):
+ if x_max <= x_min or y_max <= y_min or z_max <= z_min:
+ raise ValueError(
+ "We should have x_min < x_max and y_min < y_max and z_min < z_max. But we got"
+ " (x_min = {x_min}, y_min = {y_min}, z_min = {z_min},"
+ " x_max = {x_max}, y_max = {y_max}, z_max = {z_max})".format(
+ x_min=x_min,
+ x_max=x_max,
+ y_min=y_min,
+ y_max=y_max,
+ z_min=z_min,
+ z_max=z_max,
+ )
+ )
+ inds = np.all(
+ [
+ (points[:, 0] >= x_min),
+ (points[:, 0] < x_max),
+ (points[:, 1] >= y_min),
+ (points[:, 1] < y_max),
+ (points[:, 2] >= z_min),
+ (points[:, 2] < z_max),
+ ],
+ axis=0,
+ )
+ return inds
+
+
+def flip_in_center(coordinates):
+ # moving coordinates to center
+ coordinates -= coordinates.mean(0)
+ aug = V.Compose(
+ [
+ V.Flip3d(axis=(0, 1, 0), always_apply=True),
+ V.Flip3d(axis=(1, 0, 0), always_apply=True),
+ ]
+ )
+
+ first_crop = coordinates[:, 0] > 0
+ first_crop &= coordinates[:, 1] > 0
+ # x -y
+ second_crop = coordinates[:, 0] > 0
+ second_crop &= coordinates[:, 1] < 0
+ # -x y
+ third_crop = coordinates[:, 0] < 0
+ third_crop &= coordinates[:, 1] > 0
+ # -x -y
+ fourth_crop = coordinates[:, 0] < 0
+ fourth_crop &= coordinates[:, 1] < 0
+
+ if first_crop.size > 1:
+ coordinates[first_crop] = aug(points=coordinates[first_crop])["points"]
+ if second_crop.size > 1:
+ minimum = coordinates[second_crop].min(0)
+ minimum[2] = 0
+ minimum[0] = 0
+ coordinates[second_crop] = aug(points=coordinates[second_crop])[
+ "points"
+ ]
+ coordinates[second_crop] += minimum
+ if third_crop.size > 1:
+ minimum = coordinates[third_crop].min(0)
+ minimum[2] = 0
+ minimum[1] = 0
+ coordinates[third_crop] = aug(points=coordinates[third_crop])["points"]
+ coordinates[third_crop] += minimum
+ if fourth_crop.size > 1:
+ minimum = coordinates[fourth_crop].min(0)
+ minimum[2] = 0
+ coordinates[fourth_crop] = aug(points=coordinates[fourth_crop])[
+ "points"
+ ]
+ coordinates[fourth_crop] += minimum
+
+ return coordinates
+
+
+def random_around_points(
+ coordinates,
+ color,
+ normals,
+ labels,
+ rate=0.2,
+ noise_rate=0,
+ ignore_label=255,
+):
+ coord_indexes = sample(
+ list(range(len(coordinates))), k=int(len(coordinates) * rate)
+ )
+ noisy_coordinates = deepcopy(coordinates[coord_indexes])
+ noisy_coordinates += np.random.uniform(
+ -0.2 - noise_rate, 0.2 + noise_rate, size=noisy_coordinates.shape
+ )
+
+ if noise_rate > 0:
+ noisy_color = np.random.randint(0, 255, size=noisy_coordinates.shape)
+ noisy_normals = np.random.rand(*noisy_coordinates.shape) * 2 - 1
+ noisy_labels = np.full(labels[coord_indexes].shape, ignore_label)
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+ else:
+ noisy_color = deepcopy(color[coord_indexes])
+ noisy_normals = deepcopy(normals[coord_indexes])
+ noisy_labels = deepcopy(labels[coord_indexes])
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+
+ return coordinates, color, normals, labels
+
+
+def random_points(
+ coordinates, color, normals, labels, noise_rate=0.6, ignore_label=255
+):
+ max_boundary = coordinates.max(0) + 0.1
+ min_boundary = coordinates.min(0) - 0.1
+
+ noisy_coordinates = int(
+ (max(max_boundary) - min(min_boundary)) / noise_rate
+ )
+
+ noisy_coordinates = np.array(
+ list(
+ product(
+ np.linspace(
+ min_boundary[0], max_boundary[0], noisy_coordinates
+ ),
+ np.linspace(
+ min_boundary[1], max_boundary[1], noisy_coordinates
+ ),
+ np.linspace(
+ min_boundary[2], max_boundary[2], noisy_coordinates
+ ),
+ )
+ )
+ )
+ noisy_coordinates += np.random.uniform(
+ -noise_rate, noise_rate, size=noisy_coordinates.shape
+ )
+
+ noisy_color = np.random.randint(0, 255, size=noisy_coordinates.shape)
+ noisy_normals = np.random.rand(*noisy_coordinates.shape) * 2 - 1
+ noisy_labels = np.full(
+ (noisy_coordinates.shape[0], labels.shape[1]), ignore_label
+ )
+
+ coordinates = np.vstack((coordinates, noisy_coordinates))
+ color = np.vstack((color, noisy_color))
+ normals = np.vstack((normals, noisy_normals))
+ labels = np.vstack((labels, noisy_labels))
+ return coordinates, color, normals, labels
diff --git a/models/Mask3D/mask3d/datasets/utils.py b/models/Mask3D/mask3d/datasets/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..c91fb68ed0058a264ce76f97a618bca6e7d35a70
--- /dev/null
+++ b/models/Mask3D/mask3d/datasets/utils.py
@@ -0,0 +1,639 @@
+import MinkowskiEngine as ME
+import numpy as np
+import torch
+from random import random
+
+
+class VoxelizeCollate:
+ def __init__(
+ self,
+ ignore_label=255,
+ voxel_size=1,
+ mode="test",
+ small_crops=False,
+ very_small_crops=False,
+ batch_instance=False,
+ probing=False,
+ task="instance_segmentation",
+ ignore_class_threshold=100,
+ filter_out_classes=[],
+ label_offset=0,
+ num_queries=None,
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "task not known"
+ self.task = task
+ self.filter_out_classes = filter_out_classes
+ self.label_offset = label_offset
+ self.voxel_size = voxel_size
+ self.ignore_label = ignore_label
+ self.mode = mode
+ self.batch_instance = batch_instance
+ self.small_crops = small_crops
+ self.very_small_crops = very_small_crops
+ self.probing = probing
+ self.ignore_class_threshold = ignore_class_threshold
+
+ self.num_queries = num_queries
+
+ def __call__(self, batch):
+ if ("train" in self.mode) and (
+ self.small_crops or self.very_small_crops
+ ):
+ batch = make_crops(batch)
+ if ("train" in self.mode) and self.very_small_crops:
+ batch = make_crops(batch)
+ return voxelize(
+ batch,
+ self.ignore_label,
+ self.voxel_size,
+ self.probing,
+ self.mode,
+ task=self.task,
+ ignore_class_threshold=self.ignore_class_threshold,
+ filter_out_classes=self.filter_out_classes,
+ label_offset=self.label_offset,
+ num_queries=self.num_queries,
+ )
+
+
+class VoxelizeCollateMerge:
+ def __init__(
+ self,
+ ignore_label=255,
+ voxel_size=1,
+ mode="test",
+ scenes=2,
+ small_crops=False,
+ very_small_crops=False,
+ batch_instance=False,
+ make_one_pc_noise=False,
+ place_nearby=False,
+ place_far=False,
+ proba=1,
+ probing=False,
+ task="instance_segmentation",
+ ):
+ assert task in [
+ "instance_segmentation",
+ "semantic_segmentation",
+ ], "task not known"
+ self.task = task
+ self.mode = mode
+ self.scenes = scenes
+ self.small_crops = small_crops
+ self.very_small_crops = very_small_crops
+ self.ignore_label = ignore_label
+ self.voxel_size = voxel_size
+ self.batch_instance = batch_instance
+ self.make_one_pc_noise = make_one_pc_noise
+ self.place_nearby = place_nearby
+ self.place_far = place_far
+ self.proba = proba
+ self.probing = probing
+
+ def __call__(self, batch):
+ if (
+ ("train" in self.mode)
+ and (not self.make_one_pc_noise)
+ and (self.proba > random())
+ ):
+ if self.small_crops or self.very_small_crops:
+ batch = make_crops(batch)
+ if self.very_small_crops:
+ batch = make_crops(batch)
+ if self.batch_instance:
+ batch = batch_instances(batch)
+ new_batch = []
+ for i in range(0, len(batch), self.scenes):
+ batch_coordinates = []
+ batch_features = []
+ batch_labels = []
+
+ batch_filenames = ""
+ batch_raw_color = []
+ batch_raw_normals = []
+
+ offset_instance_id = 0
+ offset_segment_id = 0
+
+ for j in range(min(len(batch[i:]), self.scenes)):
+ batch_coordinates.append(batch[i + j][0])
+ batch_features.append(batch[i + j][1])
+
+ if j == 0:
+ batch_filenames = batch[i + j][3]
+ else:
+ batch_filenames = (
+ batch_filenames + f"+{batch[i + j][3]}"
+ )
+
+ batch_raw_color.append(batch[i + j][4])
+ batch_raw_normals.append(batch[i + j][5])
+
+ # make instance ids and segment ids unique
+ # take care that -1 instances stay at -1
+ batch_labels.append(
+ batch[i + j][2]
+ + [0, offset_instance_id, offset_segment_id]
+ )
+ batch_labels[-1][batch[i + j][2][:, 1] == -1, 1] = -1
+
+ max_instance_id, max_segment_id = batch[i + j][2].max(
+ axis=0
+ )[1:]
+ offset_segment_id = offset_segment_id + max_segment_id + 1
+ offset_instance_id = (
+ offset_instance_id + max_instance_id + 1
+ )
+
+ if (len(batch_coordinates) == 2) and self.place_nearby:
+ border = batch_coordinates[0][:, 0].max()
+ border -= batch_coordinates[1][:, 0].min()
+ batch_coordinates[1][:, 0] += border
+ elif (len(batch_coordinates) == 2) and self.place_far:
+ batch_coordinates[1] += (
+ np.random.uniform((-10, -10, -10), (10, 10, 10)) * 200
+ )
+ new_batch.append(
+ (
+ np.vstack(batch_coordinates),
+ np.vstack(batch_features),
+ np.concatenate(batch_labels),
+ batch_filenames,
+ np.vstack(batch_raw_color),
+ np.vstack(batch_raw_normals),
+ )
+ )
+ # TODO WHAT ABOUT POINT2SEGMENT AND SO ON ...
+ batch = new_batch
+ elif ("train" in self.mode) and self.make_one_pc_noise:
+ new_batch = []
+ for i in range(0, len(batch), 2):
+ if (i + 1) < len(batch):
+ new_batch.append(
+ [
+ np.vstack((batch[i][0], batch[i + 1][0])),
+ np.vstack((batch[i][1], batch[i + 1][1])),
+ np.concatenate(
+ (
+ batch[i][2],
+ np.full_like(
+ batch[i + 1][2], self.ignore_label
+ ),
+ )
+ ),
+ ]
+ )
+ new_batch.append(
+ [
+ np.vstack((batch[i][0], batch[i + 1][0])),
+ np.vstack((batch[i][1], batch[i + 1][1])),
+ np.concatenate(
+ (
+ np.full_like(
+ batch[i][2], self.ignore_label
+ ),
+ batch[i + 1][2],
+ )
+ ),
+ ]
+ )
+ else:
+ new_batch.append([batch[i][0], batch[i][1], batch[i][2]])
+ batch = new_batch
+ # return voxelize(batch, self.ignore_label, self.voxel_size, self.probing, self.mode)
+ return voxelize(
+ batch,
+ self.ignore_label,
+ self.voxel_size,
+ self.probing,
+ self.mode,
+ task=self.task,
+ )
+
+
+def batch_instances(batch):
+ new_batch = []
+ for sample in batch:
+ for instance_id in np.unique(sample[2][:, 1]):
+ new_batch.append(
+ (
+ sample[0][sample[2][:, 1] == instance_id],
+ sample[1][sample[2][:, 1] == instance_id],
+ sample[2][sample[2][:, 1] == instance_id][:, 0],
+ ),
+ )
+ return new_batch
+
+
+def voxelize(
+ batch,
+ ignore_label,
+ voxel_size,
+ probing,
+ mode,
+ task,
+ ignore_class_threshold,
+ filter_out_classes,
+ label_offset,
+ num_queries,
+):
+ (
+ coordinates,
+ features,
+ labels,
+ original_labels,
+ inverse_maps,
+ original_colors,
+ original_normals,
+ original_coordinates,
+ idx,
+ ) = ([], [], [], [], [], [], [], [], [])
+ voxelization_dict = {
+ "ignore_label": ignore_label,
+ # "quantization_size": self.voxel_size,
+ "return_index": True,
+ "return_inverse": True,
+ }
+
+ full_res_coords = []
+
+ for sample in batch:
+ idx.append(sample[7])
+ original_coordinates.append(sample[6])
+ original_labels.append(sample[2])
+ full_res_coords.append(sample[0])
+ original_colors.append(sample[4])
+ original_normals.append(sample[5])
+
+ coords = np.floor(sample[0] / voxel_size)
+ voxelization_dict.update(
+ {
+ "coordinates": torch.from_numpy(coords).to("cpu").contiguous(),
+ "features": sample[1],
+ }
+ )
+
+ # maybe this change (_, _, ...) is not necessary and we can directly get out
+ # the sample coordinates?
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(
+ **voxelization_dict
+ )
+ inverse_maps.append(inverse_map)
+
+ sample_coordinates = coords[unique_map]
+ coordinates.append(torch.from_numpy(sample_coordinates).int())
+ sample_features = sample[1][unique_map]
+ features.append(torch.from_numpy(sample_features).float())
+ if len(sample[2]) > 0:
+ sample_labels = sample[2][unique_map]
+ labels.append(torch.from_numpy(sample_labels).long())
+
+ # Concatenate all lists
+ input_dict = {"coords": coordinates, "feats": features}
+ if len(labels) > 0:
+ input_dict["labels"] = labels
+ coordinates, features, labels = ME.utils.sparse_collate(**input_dict)
+ else:
+ coordinates, features = ME.utils.sparse_collate(**input_dict)
+ labels = torch.Tensor([])
+
+ if probing:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ ),
+ labels,
+ )
+
+ if mode == "test":
+ for i in range(len(input_dict["labels"])):
+ _, ret_index, ret_inv = np.unique(
+ input_dict["labels"][i][:, 0],
+ return_index=True,
+ return_inverse=True,
+ )
+ input_dict["labels"][i][:, 0] = torch.from_numpy(ret_inv)
+ # input_dict["segment2label"].append(input_dict["labels"][i][ret_index][:, :-1])
+ else:
+ input_dict["segment2label"] = []
+
+ if "labels" in input_dict:
+ for i in range(len(input_dict["labels"])):
+ # TODO BIGGER CHANGE CHECK!!!
+ _, ret_index, ret_inv = np.unique(
+ input_dict["labels"][i][:, -1],
+ return_index=True,
+ return_inverse=True,
+ )
+ input_dict["labels"][i][:, -1] = torch.from_numpy(ret_inv)
+ input_dict["segment2label"].append(
+ input_dict["labels"][i][ret_index][:, :-1]
+ )
+
+ if "labels" in input_dict:
+ list_labels = input_dict["labels"]
+
+ target = []
+ target_full = []
+
+ if len(list_labels[0].shape) == 1:
+ for batch_id in range(len(list_labels)):
+ label_ids = list_labels[batch_id].unique()
+ if 255 in label_ids:
+ label_ids = label_ids[:-1]
+
+ target.append(
+ {
+ "labels": label_ids,
+ "masks": list_labels[batch_id]
+ == label_ids.unsqueeze(1),
+ }
+ )
+ else:
+ if mode == "test":
+ for i in range(len(input_dict["labels"])):
+ target.append(
+ {"point2segment": input_dict["labels"][i][:, 0]}
+ )
+ target_full.append(
+ {
+ "point2segment": torch.from_numpy(
+ original_labels[i][:, 0]
+ ).long()
+ }
+ )
+ else:
+ target = get_instance_masks(
+ list_labels,
+ list_segments=input_dict["segment2label"],
+ task=task,
+ ignore_class_threshold=ignore_class_threshold,
+ filter_out_classes=filter_out_classes,
+ label_offset=label_offset,
+ )
+ for i in range(len(target)):
+ target[i]["point2segment"] = input_dict["labels"][i][:, 2]
+ if "train" not in mode:
+ target_full = get_instance_masks(
+ [torch.from_numpy(l) for l in original_labels],
+ task=task,
+ ignore_class_threshold=ignore_class_threshold,
+ filter_out_classes=filter_out_classes,
+ label_offset=label_offset,
+ )
+ for i in range(len(target_full)):
+ target_full[i]["point2segment"] = torch.from_numpy(
+ original_labels[i][:, 2]
+ ).long()
+ else:
+ target = []
+ target_full = []
+ coordinates = []
+ features = []
+
+ if "train" not in mode:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ full_res_coords,
+ target_full,
+ original_colors,
+ original_normals,
+ original_coordinates,
+ idx,
+ ),
+ target,
+ [sample[3] for sample in batch],
+ )
+ else:
+ return (
+ NoGpu(
+ coordinates,
+ features,
+ original_labels,
+ inverse_maps,
+ full_res_coords,
+ ),
+ target,
+ [sample[3] for sample in batch],
+ )
+
+
+def get_instance_masks(
+ list_labels,
+ task,
+ list_segments=None,
+ ignore_class_threshold=100,
+ filter_out_classes=[],
+ label_offset=0,
+):
+ target = []
+
+ for batch_id in range(len(list_labels)):
+ label_ids = []
+ masks = []
+ segment_masks = []
+ instance_ids = list_labels[batch_id][:, 1].unique()
+
+ for instance_id in instance_ids:
+ if instance_id == -1:
+ continue
+
+ # TODO is it possible that a ignore class (255) is an instance???
+ # instance == -1 ???
+ tmp = list_labels[batch_id][
+ list_labels[batch_id][:, 1] == instance_id
+ ]
+ label_id = tmp[0, 0]
+
+ if (
+ label_id in filter_out_classes
+ ): # floor, wall, undefined==255 is not included
+ continue
+
+ if (
+ 255 in filter_out_classes
+ and label_id.item() == 255
+ and tmp.shape[0] < ignore_class_threshold
+ ):
+ continue
+
+ label_ids.append(label_id)
+ masks.append(list_labels[batch_id][:, 1] == instance_id)
+
+ if list_segments:
+ segment_mask = torch.zeros(
+ list_segments[batch_id].shape[0]
+ ).bool()
+ segment_mask[
+ list_labels[batch_id][
+ list_labels[batch_id][:, 1] == instance_id
+ ][:, 2].unique()
+ ] = True
+ segment_masks.append(segment_mask)
+
+ if len(label_ids) == 0:
+ return list()
+
+ label_ids = torch.stack(label_ids)
+ masks = torch.stack(masks)
+ if list_segments:
+ segment_masks = torch.stack(segment_masks)
+
+ if task == "semantic_segmentation":
+ new_label_ids = []
+ new_masks = []
+ new_segment_masks = []
+ for label_id in label_ids.unique():
+ masking = label_ids == label_id
+
+ new_label_ids.append(label_id)
+ new_masks.append(masks[masking, :].sum(dim=0).bool())
+
+ if list_segments:
+ new_segment_masks.append(
+ segment_masks[masking, :].sum(dim=0).bool()
+ )
+
+ label_ids = torch.stack(new_label_ids)
+ masks = torch.stack(new_masks)
+
+ if list_segments:
+ segment_masks = torch.stack(new_segment_masks)
+
+ target.append(
+ {
+ "labels": label_ids,
+ "masks": masks,
+ "segment_mask": segment_masks,
+ }
+ )
+ else:
+ target.append({"labels": label_ids, "masks": masks})
+ else:
+ l = torch.clamp(label_ids - label_offset, min=0)
+
+ if list_segments:
+ target.append(
+ {
+ "labels": l,
+ "masks": masks,
+ "segment_mask": segment_masks,
+ }
+ )
+ else:
+ target.append({"labels": l, "masks": masks})
+ return target
+
+
+def make_crops(batch):
+ new_batch = []
+ # detupling
+ for scene in batch:
+ new_batch.append([scene[0], scene[1], scene[2]])
+ batch = new_batch
+ new_batch = []
+ for scene in batch:
+ # move to center for better quadrant split
+ scene[0][:, :3] -= scene[0][:, :3].mean(0)
+
+ # BUGFIX - there always would be a point in every quadrant
+ scene[0] = np.vstack(
+ (
+ scene[0],
+ np.array(
+ [
+ [0.1, 0.1, 0.1],
+ [0.1, -0.1, 0.1],
+ [-0.1, 0.1, 0.1],
+ [-0.1, -0.1, 0.1],
+ ]
+ ),
+ )
+ )
+ scene[1] = np.vstack((scene[1], np.zeros((4, scene[1].shape[1]))))
+ scene[2] = np.concatenate(
+ (scene[2], np.full_like((scene[2]), 255)[:4])
+ )
+
+ crop = scene[0][:, 0] > 0
+ crop &= scene[0][:, 1] > 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] > 0
+ crop &= scene[0][:, 1] < 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] < 0
+ crop &= scene[0][:, 1] > 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ crop = scene[0][:, 0] < 0
+ crop &= scene[0][:, 1] < 0
+ if crop.size > 1:
+ new_batch.append([scene[0][crop], scene[1][crop], scene[2][crop]])
+
+ # moving all of them to center
+ for i in range(len(new_batch)):
+ new_batch[i][0][:, :3] -= new_batch[i][0][:, :3].mean(0)
+ return new_batch
+
+
+class NoGpu:
+ def __init__(
+ self,
+ coordinates,
+ features,
+ original_labels=None,
+ inverse_maps=None,
+ full_res_coords=None,
+ target_full=None,
+ original_colors=None,
+ original_normals=None,
+ original_coordinates=None,
+ idx=None,
+ ):
+ """helper class to prevent gpu loading on lightning"""
+ self.coordinates = coordinates
+ self.features = features
+ self.original_labels = original_labels
+ self.inverse_maps = inverse_maps
+ self.full_res_coords = full_res_coords
+ self.target_full = target_full
+ self.original_colors = original_colors
+ self.original_normals = original_normals
+ self.original_coordinates = original_coordinates
+ self.idx = idx
+
+
+class NoGpuMask:
+ def __init__(
+ self,
+ coordinates,
+ features,
+ original_labels=None,
+ inverse_maps=None,
+ masks=None,
+ labels=None,
+ ):
+ """helper class to prevent gpu loading on lightning"""
+ self.coordinates = coordinates
+ self.features = features
+ self.original_labels = original_labels
+ self.inverse_maps = inverse_maps
+
+ self.masks = masks
+ self.labels = labels
diff --git a/models/Mask3D/mask3d/main_instance_segmentation.py b/models/Mask3D/mask3d/main_instance_segmentation.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2664673cb3a1fa16191e7baa82a50bbb8f5f195
--- /dev/null
+++ b/models/Mask3D/mask3d/main_instance_segmentation.py
@@ -0,0 +1,114 @@
+import logging
+import os
+from hashlib import md5
+from uuid import uuid4
+import hydra
+from dotenv import load_dotenv
+from omegaconf import DictConfig, OmegaConf
+from trainer.trainer import InstanceSegmentation, RegularCheckpointing
+from pytorch_lightning.callbacks import ModelCheckpoint
+from utils.utils import (
+ flatten_dict,
+ load_baseline_model,
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+from pytorch_lightning import Trainer, seed_everything
+
+
+def get_parameters(cfg: DictConfig):
+ logger = logging.getLogger(__name__)
+ load_dotenv(".env")
+
+ # parsing input parameters
+ seed_everything(cfg.general.seed)
+
+ # getting basic configuration
+ if cfg.general.get("gpus", None) is None:
+ cfg.general.gpus = os.environ.get("CUDA_VISIBLE_DEVICES", None)
+ loggers = []
+
+ # cfg.general.experiment_id = "0" # str(Repo("./").commit())[:8]
+ # params = flatten_dict(OmegaConf.to_container(cfg, resolve=True))
+
+ # create unique id for experiments that are run locally
+ # unique_id = "_" + str(uuid4())[:4]
+ # cfg.general.version = md5(str(params).encode("utf-8")).hexdigest()[:8] + unique_id
+
+ if not os.path.exists(cfg.general.save_dir):
+ os.makedirs(cfg.general.save_dir)
+ else:
+ print("EXPERIMENT ALREADY EXIST")
+ cfg["trainer"][
+ "resume_from_checkpoint"
+ ] = f"{cfg.general.save_dir}/last-epoch.ckpt"
+
+ for log in cfg.logging:
+ print(log)
+ # loggers.append(hydra.utils.instantiate(log))
+ # loggers[-1].log_hyperparams(
+ # flatten_dict(OmegaConf.to_container(cfg, resolve=True))
+ # )
+
+ model = InstanceSegmentation(cfg)
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ logger.info(flatten_dict(OmegaConf.to_container(cfg, resolve=True)))
+ return cfg, model, loggers
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def train(cfg: DictConfig):
+ os.chdir(hydra.utils.get_original_cwd())
+ cfg, model, loggers = get_parameters(cfg)
+ callbacks = []
+ for cb in cfg.callbacks:
+ callbacks.append(hydra.utils.instantiate(cb))
+
+ callbacks.append(RegularCheckpointing())
+
+ runner = Trainer(
+ logger=loggers,
+ gpus=cfg.general.gpus,
+ callbacks=callbacks,
+ weights_save_path=str(cfg.general.save_dir),
+ **cfg.trainer,
+ )
+ runner.fit(model)
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def test(cfg: DictConfig):
+ # because hydra wants to change dir for some reason
+ os.chdir(hydra.utils.get_original_cwd())
+ cfg, model, loggers = get_parameters(cfg)
+ runner = Trainer(
+ gpus=cfg.general.gpus,
+ logger=loggers,
+ weights_save_path=str(cfg.general.save_dir),
+ **cfg.trainer,
+ )
+ runner.test(model)
+
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def main(cfg: DictConfig):
+ if cfg["general"]["train_mode"]:
+ train(cfg)
+ else:
+ test(cfg)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/models/Mask3D/mask3d/models/__init__.py b/models/Mask3D/mask3d/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b092c965bba4c734b49a7f4d2e3ab6fee8471d17
--- /dev/null
+++ b/models/Mask3D/mask3d/models/__init__.py
@@ -0,0 +1,44 @@
+import mask3d.models.resunet as resunet
+import mask3d.models.res16unet as res16unet
+from mask3d.models.res16unet import (
+ Res16UNet34C,
+ Res16UNet34A,
+ Res16UNet14A,
+ Res16UNet34D,
+ Res16UNet18D,
+ Res16UNet18B,
+ Custom30M,
+)
+from mask3d.models.mask3d import Mask3D
+
+MODELS = []
+
+
+def add_models(module):
+ MODELS.extend([getattr(module, a) for a in dir(module) if "Net" in a])
+
+
+add_models(resunet)
+add_models(res16unet)
+add_models(mask3d)
+
+
+def get_models():
+ """Returns a tuple of sample models."""
+ return MODELS
+
+
+def load_model(name):
+ """Creates and returns an instance of the model given its class name."""
+ # Find the model class from its name
+ all_models = get_models()
+ mdict = {model.__name__: model for model in all_models}
+ if name not in mdict:
+ print("Invalid model index. Options are:")
+ # Display a list of valid model names
+ for model in all_models:
+ print(f"\t* {model.__name__}")
+ return None
+ NetClass = mdict[name]
+
+ return NetClass
diff --git a/models/Mask3D/mask3d/models/criterion.py b/models/Mask3D/mask3d/models/criterion.py
new file mode 100644
index 0000000000000000000000000000000000000000..19ce8bc8ecf4a0be08ce91e45857412a8d55efba
--- /dev/null
+++ b/models/Mask3D/mask3d/models/criterion.py
@@ -0,0 +1,343 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/models/detr.py
+# Modified for Mask3D
+"""
+MaskFormer criterion.
+"""
+
+import torch
+import torch.nn.functional as F
+from torch import nn
+
+from detectron2.utils.comm import get_world_size
+from detectron2.projects.point_rend.point_features import (
+ get_uncertain_point_coords_with_randomness,
+ point_sample,
+)
+
+from mask3d.models.misc import (
+ is_dist_avail_and_initialized,
+ nested_tensor_from_tensor_list,
+)
+
+
+def dice_loss(
+ inputs: torch.Tensor,
+ targets: torch.Tensor,
+ num_masks: float,
+):
+ """
+ Compute the DICE loss, similar to generalized IOU for masks
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ """
+ inputs = inputs.sigmoid()
+ inputs = inputs.flatten(1)
+ numerator = 2 * (inputs * targets).sum(-1)
+ denominator = inputs.sum(-1) + targets.sum(-1)
+ loss = 1 - (numerator + 1) / (denominator + 1)
+ return loss.sum() / num_masks
+
+
+dice_loss_jit = torch.jit.script(dice_loss) # type: torch.jit.ScriptModule
+
+
+def sigmoid_ce_loss(
+ inputs: torch.Tensor,
+ targets: torch.Tensor,
+ num_masks: float,
+):
+ """
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ Returns:
+ Loss tensor
+ """
+ loss = F.binary_cross_entropy_with_logits(
+ inputs, targets, reduction="none"
+ )
+
+ return loss.mean(1).sum() / num_masks
+
+
+sigmoid_ce_loss_jit = torch.jit.script(
+ sigmoid_ce_loss
+) # type: torch.jit.ScriptModule
+
+
+def calculate_uncertainty(logits):
+ """
+ We estimate uncerainty as L1 distance between 0.0 and the logit prediction in 'logits' for the
+ foreground class in `classes`.
+ Args:
+ logits (Tensor): A tensor of shape (R, 1, ...) for class-specific or
+ class-agnostic, where R is the total number of predicted masks in all images and C is
+ the number of foreground classes. The values are logits.
+ Returns:
+ scores (Tensor): A tensor of shape (R, 1, ...) that contains uncertainty scores with
+ the most uncertain locations having the highest uncertainty score.
+ """
+ assert logits.shape[1] == 1
+ gt_class_logits = logits.clone()
+ return -(torch.abs(gt_class_logits))
+
+
+class SetCriterion(nn.Module):
+ """This class computes the loss for DETR.
+ The process happens in two steps:
+ 1) we compute hungarian assignment between ground truth boxes and the outputs of the model
+ 2) we supervise each pair of matched ground-truth / prediction (supervise class and box)
+ """
+
+ def __init__(
+ self,
+ num_classes,
+ matcher,
+ weight_dict,
+ eos_coef,
+ losses,
+ num_points,
+ oversample_ratio,
+ importance_sample_ratio,
+ class_weights,
+ ):
+ """Create the criterion.
+ Parameters:
+ num_classes: number of object categories, omitting the special no-object category
+ matcher: module able to compute a matching between targets and proposals
+ weight_dict: dict containing as key the names of the losses and as values their relative weight.
+ eos_coef: relative classification weight applied to the no-object category
+ losses: list of all the losses to be applied. See get_loss for list of available losses.
+ """
+ super().__init__()
+ self.num_classes = num_classes - 1
+ self.class_weights = class_weights
+ self.matcher = matcher
+ self.weight_dict = weight_dict
+ self.eos_coef = eos_coef
+ self.losses = losses
+ empty_weight = torch.ones(self.num_classes + 1)
+ empty_weight[-1] = self.eos_coef
+
+ if self.class_weights != -1:
+ assert (
+ len(self.class_weights) == self.num_classes
+ ), "CLASS WEIGHTS DO NOT MATCH"
+ empty_weight[:-1] = torch.tensor(self.class_weights)
+
+ self.register_buffer("empty_weight", empty_weight)
+
+ # pointwise mask loss parameters
+ self.num_points = num_points
+ self.oversample_ratio = oversample_ratio
+ self.importance_sample_ratio = importance_sample_ratio
+
+ def loss_labels(self, outputs, targets, indices, num_masks, mask_type):
+ """Classification loss (NLL)
+ targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
+ """
+ assert "pred_logits" in outputs
+ src_logits = outputs["pred_logits"].float()
+
+ idx = self._get_src_permutation_idx(indices)
+ target_classes_o = torch.cat(
+ [t["labels"][J] for t, (_, J) in zip(targets, indices)]
+ )
+ target_classes = torch.full(
+ src_logits.shape[:2],
+ self.num_classes,
+ dtype=torch.int64,
+ device=src_logits.device,
+ )
+ target_classes[idx] = target_classes_o
+
+ loss_ce = F.cross_entropy(
+ src_logits.transpose(1, 2),
+ target_classes,
+ self.empty_weight,
+ ignore_index=253,
+ )
+ losses = {"loss_ce": loss_ce}
+ return losses
+
+ def loss_masks(self, outputs, targets, indices, num_masks, mask_type):
+ """Compute the losses related to the masks: the focal loss and the dice loss.
+ targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
+ """
+ assert "pred_masks" in outputs
+
+ loss_masks = []
+ loss_dices = []
+
+ for batch_id, (map_id, target_id) in enumerate(indices):
+ map = outputs["pred_masks"][batch_id][:, map_id].T
+ target_mask = targets[batch_id][mask_type][target_id]
+
+ if self.num_points != -1:
+ point_idx = torch.randperm(
+ target_mask.shape[1], device=target_mask.device
+ )[: int(self.num_points * target_mask.shape[1])]
+ else:
+ # sample all points
+ point_idx = torch.arange(
+ target_mask.shape[1], device=target_mask.device
+ )
+
+ num_masks = target_mask.shape[0]
+ map = map[:, point_idx]
+ target_mask = target_mask[:, point_idx].float()
+
+ loss_masks.append(sigmoid_ce_loss_jit(map, target_mask, num_masks))
+ loss_dices.append(dice_loss_jit(map, target_mask, num_masks))
+ # del target_mask
+ return {
+ "loss_mask": torch.sum(torch.stack(loss_masks)),
+ "loss_dice": torch.sum(torch.stack(loss_dices)),
+ }
+
+ src_idx = self._get_src_permutation_idx(indices)
+ tgt_idx = self._get_tgt_permutation_idx(indices)
+ src_masks = outputs["pred_masks"]
+ src_masks = src_masks[src_idx]
+ masks = [t[mask_type] for t in targets]
+ # TODO use valid to mask invalid areas due to padding in loss
+ target_masks, valid = nested_tensor_from_tensor_list(masks).decompose()
+ target_masks = target_masks.to(src_masks)
+ target_masks = target_masks[tgt_idx]
+
+ # No need to upsample predictions as we are using normalized coordinates :)
+ # N x 1 x H x W
+ src_masks = src_masks[:, None]
+ target_masks = target_masks[:, None]
+
+ with torch.no_grad():
+ # sample point_coords
+ point_coords = get_uncertain_point_coords_with_randomness(
+ src_masks,
+ lambda logits: calculate_uncertainty(logits),
+ self.num_points,
+ self.oversample_ratio,
+ self.importance_sample_ratio,
+ )
+ # get gt labels
+ point_labels = point_sample(
+ target_masks,
+ point_coords,
+ align_corners=False,
+ ).squeeze(1)
+
+ point_logits = point_sample(
+ src_masks,
+ point_coords,
+ align_corners=False,
+ ).squeeze(1)
+
+ losses = {
+ "loss_mask": sigmoid_ce_loss_jit(
+ point_logits, point_labels, num_masks, mask_type
+ ),
+ "loss_dice": dice_loss_jit(
+ point_logits, point_labels, num_masks, mask_type
+ ),
+ }
+
+ del src_masks
+ del target_masks
+ return losses
+
+ def _get_src_permutation_idx(self, indices):
+ # permute predictions following indices
+ batch_idx = torch.cat(
+ [torch.full_like(src, i) for i, (src, _) in enumerate(indices)]
+ )
+ src_idx = torch.cat([src for (src, _) in indices])
+ return batch_idx, src_idx
+
+ def _get_tgt_permutation_idx(self, indices):
+ # permute targets following indices
+ batch_idx = torch.cat(
+ [torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)]
+ )
+ tgt_idx = torch.cat([tgt for (_, tgt) in indices])
+ return batch_idx, tgt_idx
+
+ def get_loss(self, loss, outputs, targets, indices, num_masks, mask_type):
+ loss_map = {"labels": self.loss_labels, "masks": self.loss_masks}
+ assert loss in loss_map, f"do you really want to compute {loss} loss?"
+ return loss_map[loss](outputs, targets, indices, num_masks, mask_type)
+
+ def forward(self, outputs, targets, mask_type):
+ """This performs the loss computation.
+ Parameters:
+ outputs: dict of tensors, see the output specification of the model for the format
+ targets: list of dicts, such that len(targets) == batch_size.
+ The expected keys in each dict depends on the losses applied, see each loss' doc
+ """
+ outputs_without_aux = {
+ k: v for k, v in outputs.items() if k != "aux_outputs"
+ }
+
+ # Retrieve the matching between the outputs of the last layer and the targets
+ indices = self.matcher(outputs_without_aux, targets, mask_type)
+
+ # Compute the average number of target boxes accross all nodes, for normalization purposes
+ num_masks = sum(len(t["labels"]) for t in targets)
+ num_masks = torch.as_tensor(
+ [num_masks],
+ dtype=torch.float,
+ device=next(iter(outputs.values())).device,
+ )
+ if is_dist_avail_and_initialized():
+ torch.distributed.all_reduce(num_masks)
+ num_masks = torch.clamp(num_masks / get_world_size(), min=1).item()
+
+ # Compute all the requested losses
+ losses = {}
+ for loss in self.losses:
+ losses.update(
+ self.get_loss(
+ loss, outputs, targets, indices, num_masks, mask_type
+ )
+ )
+
+ # In case of auxiliary losses, we repeat this process with the output of each intermediate layer.
+ if "aux_outputs" in outputs:
+ for i, aux_outputs in enumerate(outputs["aux_outputs"]):
+ indices = self.matcher(aux_outputs, targets, mask_type)
+ for loss in self.losses:
+ l_dict = self.get_loss(
+ loss,
+ aux_outputs,
+ targets,
+ indices,
+ num_masks,
+ mask_type,
+ )
+ l_dict = {k + f"_{i}": v for k, v in l_dict.items()}
+ losses.update(l_dict)
+
+ return losses
+
+ def __repr__(self):
+ head = "Criterion " + self.__class__.__name__
+ body = [
+ "matcher: {}".format(self.matcher.__repr__(_repr_indent=8)),
+ "losses: {}".format(self.losses),
+ "weight_dict: {}".format(self.weight_dict),
+ "num_classes: {}".format(self.num_classes),
+ "eos_coef: {}".format(self.eos_coef),
+ "num_points: {}".format(self.num_points),
+ "oversample_ratio: {}".format(self.oversample_ratio),
+ "importance_sample_ratio: {}".format(self.importance_sample_ratio),
+ ]
+ _repr_indent = 4
+ lines = [head] + [" " * _repr_indent + line for line in body]
+ return "\n".join(lines)
diff --git a/models/Mask3D/mask3d/models/mask3d.py b/models/Mask3D/mask3d/models/mask3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..b7cd4c7a6a74b44df90bbd8d668c7def474f2b10
--- /dev/null
+++ b/models/Mask3D/mask3d/models/mask3d.py
@@ -0,0 +1,870 @@
+import torch
+import hydra
+import torch.nn as nn
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine.MinkowskiPooling import MinkowskiAvgPooling
+import numpy as np
+from torch.nn import functional as F
+from mask3d.models.modules.common import conv
+from mask3d.models.position_embedding import PositionEmbeddingCoordsSine
+from mask3d.models.modules.helpers_3detr import GenericMLP
+from torch_scatter import scatter_mean, scatter_max, scatter_min
+from torch.cuda.amp import autocast
+
+from pointnet2.pointnet2_utils import furthest_point_sample
+
+
+class Mask3D(nn.Module):
+ def __init__(
+ self,
+ config,
+ hidden_dim,
+ num_queries,
+ num_heads,
+ dim_feedforward,
+ sample_sizes,
+ shared_decoder,
+ num_classes,
+ num_decoders,
+ dropout,
+ pre_norm,
+ positional_encoding_type,
+ non_parametric_queries,
+ train_on_segments,
+ normalize_pos_enc,
+ use_level_embed,
+ scatter_type,
+ hlevels,
+ use_np_features,
+ voxel_size,
+ max_sample_size,
+ random_queries,
+ gauss_scale,
+ random_query_both,
+ random_normal,
+ ):
+ super().__init__()
+ self.random_normal = random_normal
+ self.random_query_both = random_query_both
+ self.random_queries = random_queries
+ self.max_sample_size = max_sample_size
+ self.gauss_scale = gauss_scale
+ self.voxel_size = voxel_size
+ self.scatter_type = scatter_type
+ self.hlevels = hlevels
+ self.use_level_embed = use_level_embed
+ self.train_on_segments = train_on_segments
+ self.normalize_pos_enc = normalize_pos_enc
+ self.num_decoders = num_decoders
+ self.num_classes = num_classes
+ self.dropout = dropout
+ self.pre_norm = pre_norm
+ self.shared_decoder = shared_decoder
+ self.sample_sizes = sample_sizes
+ self.non_parametric_queries = non_parametric_queries
+ self.use_np_features = use_np_features
+ self.mask_dim = hidden_dim
+ self.num_heads = num_heads
+ self.num_queries = num_queries
+ self.pos_enc_type = positional_encoding_type
+
+ self.backbone = hydra.utils.instantiate(config.backbone)
+ self.num_levels = len(self.hlevels)
+ sizes = self.backbone.PLANES[-5:]
+
+ self.mask_features_head = conv(
+ self.backbone.PLANES[7],
+ self.mask_dim,
+ kernel_size=1,
+ stride=1,
+ bias=True,
+ D=3,
+ )
+
+ if self.scatter_type == "mean":
+ self.scatter_fn = scatter_mean
+ elif self.scatter_type == "max":
+ self.scatter_fn = lambda mask, p2s, dim: scatter_max(
+ mask, p2s, dim=dim
+ )[0]
+ else:
+ assert False, "Scatter function not known"
+
+ assert (
+ not use_np_features
+ ) or non_parametric_queries, "np features only with np queries"
+
+ if self.non_parametric_queries:
+ self.query_projection = GenericMLP(
+ input_dim=self.mask_dim,
+ hidden_dims=[self.mask_dim],
+ output_dim=self.mask_dim,
+ use_conv=True,
+ output_use_activation=True,
+ hidden_use_bias=True,
+ )
+
+ if self.use_np_features:
+ self.np_feature_projection = nn.Sequential(
+ nn.Linear(sizes[-1], hidden_dim),
+ nn.ReLU(),
+ nn.Linear(hidden_dim, hidden_dim),
+ )
+ elif self.random_query_both:
+ self.query_projection = GenericMLP(
+ input_dim=2 * self.mask_dim,
+ hidden_dims=[2 * self.mask_dim],
+ output_dim=2 * self.mask_dim,
+ use_conv=True,
+ output_use_activation=True,
+ hidden_use_bias=True,
+ )
+ else:
+ # PARAMETRIC QUERIES
+ # learnable query features
+ self.query_feat = nn.Embedding(num_queries, hidden_dim)
+ # learnable query p.e.
+ self.query_pos = nn.Embedding(num_queries, hidden_dim)
+
+ if self.use_level_embed:
+ # learnable scale-level embedding
+ self.level_embed = nn.Embedding(self.num_levels, hidden_dim)
+
+ self.mask_embed_head = nn.Sequential(
+ nn.Linear(hidden_dim, hidden_dim),
+ nn.ReLU(),
+ nn.Linear(hidden_dim, hidden_dim),
+ )
+
+ self.class_embed_head = nn.Linear(hidden_dim, self.num_classes)
+
+ if self.pos_enc_type == "legacy":
+ self.pos_enc = PositionalEncoding3D(channels=self.mask_dim)
+ elif self.pos_enc_type == "fourier":
+ self.pos_enc = PositionEmbeddingCoordsSine(
+ pos_type="fourier",
+ d_pos=self.mask_dim,
+ gauss_scale=self.gauss_scale,
+ normalize=self.normalize_pos_enc,
+ )
+ elif self.pos_enc_type == "sine":
+ self.pos_enc = PositionEmbeddingCoordsSine(
+ pos_type="sine",
+ d_pos=self.mask_dim,
+ normalize=self.normalize_pos_enc,
+ )
+ else:
+ assert False, "pos enc type not known"
+
+ self.pooling = MinkowskiAvgPooling(
+ kernel_size=2, stride=2, dimension=3
+ )
+
+ self.masked_transformer_decoder = nn.ModuleList()
+ self.cross_attention = nn.ModuleList()
+ self.self_attention = nn.ModuleList()
+ self.ffn_attention = nn.ModuleList()
+ self.lin_squeeze = nn.ModuleList()
+
+ num_shared = self.num_decoders if not self.shared_decoder else 1
+
+ for _ in range(num_shared):
+ tmp_cross_attention = nn.ModuleList()
+ tmp_self_attention = nn.ModuleList()
+ tmp_ffn_attention = nn.ModuleList()
+ tmp_squeeze_attention = nn.ModuleList()
+ for i, hlevel in enumerate(self.hlevels):
+ tmp_cross_attention.append(
+ CrossAttentionLayer(
+ d_model=self.mask_dim,
+ nhead=self.num_heads,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ tmp_squeeze_attention.append(
+ nn.Linear(sizes[hlevel], self.mask_dim)
+ )
+
+ tmp_self_attention.append(
+ SelfAttentionLayer(
+ d_model=self.mask_dim,
+ nhead=self.num_heads,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ tmp_ffn_attention.append(
+ FFNLayer(
+ d_model=self.mask_dim,
+ dim_feedforward=dim_feedforward,
+ dropout=self.dropout,
+ normalize_before=self.pre_norm,
+ )
+ )
+
+ self.cross_attention.append(tmp_cross_attention)
+ self.self_attention.append(tmp_self_attention)
+ self.ffn_attention.append(tmp_ffn_attention)
+ self.lin_squeeze.append(tmp_squeeze_attention)
+
+ self.decoder_norm = nn.LayerNorm(hidden_dim)
+
+ def get_pos_encs(self, coords):
+ pos_encodings_pcd = []
+
+ for i in range(len(coords)):
+ pos_encodings_pcd.append([[]])
+ for coords_batch in coords[i].decomposed_features:
+ scene_min = coords_batch.min(dim=0)[0][None, ...]
+ scene_max = coords_batch.max(dim=0)[0][None, ...]
+
+ with autocast(enabled=False):
+ tmp = self.pos_enc(
+ coords_batch[None, ...].float(),
+ input_range=[scene_min, scene_max],
+ )
+
+ pos_encodings_pcd[-1][0].append(tmp.squeeze(0).permute((1, 0)))
+
+ return pos_encodings_pcd
+
+ def forward(
+ self, x, point2segment=None, raw_coordinates=None, is_eval=False
+ ):
+ # print(x)
+ pcd_features, aux = self.backbone(x)
+
+ batch_size = len(x.decomposed_coordinates)
+
+ with torch.no_grad():
+ coordinates = me.SparseTensor(
+ features=raw_coordinates,
+ coordinate_manager=aux[-1].coordinate_manager,
+ coordinate_map_key=aux[-1].coordinate_map_key,
+ device=aux[-1].device,
+ )
+
+ coords = [coordinates]
+ for _ in reversed(range(len(aux) - 1)):
+ coords.append(self.pooling(coords[-1]))
+
+ coords.reverse()
+
+ pos_encodings_pcd = self.get_pos_encs(coords)
+ mask_features = self.mask_features_head(pcd_features)
+ if point2segment is not None:
+ mask_segments = []
+ for i, mask_feature in enumerate(
+ mask_features.decomposed_features
+ ):
+ mask_segments.append(
+ self.scatter_fn(mask_feature, point2segment[i], dim=0)
+ )
+
+ sampled_coords = None
+
+ if self.non_parametric_queries:
+ fps_idx = [
+ furthest_point_sample(
+ x.decomposed_coordinates[i][None, ...].float(),
+ self.num_queries,
+ )
+ .squeeze(0)
+ .long()
+ for i in range(len(x.decomposed_coordinates))
+ ]
+
+ sampled_coords = torch.stack(
+ [
+ coordinates.decomposed_features[i][fps_idx[i].long(), :]
+ for i in range(len(fps_idx))
+ ]
+ )
+
+ mins = torch.stack(
+ [
+ coordinates.decomposed_features[i].min(dim=0)[0]
+ for i in range(len(coordinates.decomposed_features))
+ ]
+ )
+ maxs = torch.stack(
+ [
+ coordinates.decomposed_features[i].max(dim=0)[0]
+ for i in range(len(coordinates.decomposed_features))
+ ]
+ )
+
+ query_pos = self.pos_enc(
+ sampled_coords.float(), input_range=[mins, maxs]
+ ) # Batch, Dim, queries
+ query_pos = self.query_projection(query_pos)
+
+ if not self.use_np_features:
+ queries = torch.zeros_like(query_pos).permute((0, 2, 1))
+ else:
+ queries = torch.stack(
+ [
+ pcd_features.decomposed_features[i][
+ fps_idx[i].long(), :
+ ]
+ for i in range(len(fps_idx))
+ ]
+ )
+ queries = self.np_feature_projection(queries)
+ query_pos = query_pos.permute((2, 0, 1))
+ elif self.random_queries:
+ query_pos = (
+ torch.rand(
+ batch_size,
+ self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+ - 0.5
+ )
+
+ queries = torch.zeros_like(query_pos).permute((0, 2, 1))
+ query_pos = query_pos.permute((2, 0, 1))
+ elif self.random_query_both:
+ if not self.random_normal:
+ query_pos_feat = (
+ torch.rand(
+ batch_size,
+ 2 * self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+ - 0.5
+ )
+ else:
+ query_pos_feat = torch.randn(
+ batch_size,
+ 2 * self.mask_dim,
+ self.num_queries,
+ device=x.device,
+ )
+
+ queries = query_pos_feat[:, : self.mask_dim, :].permute((0, 2, 1))
+ query_pos = query_pos_feat[:, self.mask_dim :, :].permute(
+ (2, 0, 1)
+ )
+ else:
+ # PARAMETRIC QUERIES
+ queries = self.query_feat.weight.unsqueeze(0).repeat(
+ batch_size, 1, 1
+ )
+ query_pos = self.query_pos.weight.unsqueeze(1).repeat(
+ 1, batch_size, 1
+ )
+
+ predictions_class = []
+ predictions_mask = []
+
+ for decoder_counter in range(self.num_decoders):
+ if self.shared_decoder:
+ decoder_counter = 0
+ for i, hlevel in enumerate(self.hlevels):
+ if point2segment is not None:
+ output_class, outputs_mask, attn_mask = self.mask_module(
+ queries,
+ mask_features,
+ mask_segments,
+ len(aux) - hlevel - 1,
+ ret_attn_mask=True,
+ point2segment=point2segment,
+ coords=coords,
+ )
+ else:
+ output_class, outputs_mask, attn_mask = self.mask_module(
+ queries,
+ mask_features,
+ None,
+ len(aux) - hlevel - 1,
+ ret_attn_mask=True,
+ point2segment=None,
+ coords=coords,
+ )
+
+ decomposed_aux = aux[hlevel].decomposed_features
+ decomposed_attn = attn_mask.decomposed_features
+
+ curr_sample_size = max(
+ [pcd.shape[0] for pcd in decomposed_aux]
+ )
+
+ if min([pcd.shape[0] for pcd in decomposed_aux]) == 1:
+ raise RuntimeError(
+ "only a single point gives nans in cross-attention"
+ )
+
+ if not (self.max_sample_size or is_eval):
+ curr_sample_size = min(
+ curr_sample_size, self.sample_sizes[hlevel]
+ )
+
+ rand_idx = []
+ mask_idx = []
+ for k in range(len(decomposed_aux)):
+ pcd_size = decomposed_aux[k].shape[0]
+ if pcd_size <= curr_sample_size:
+ # we do not need to sample
+ # take all points and pad the rest with zeroes and mask it
+ idx = torch.zeros(
+ curr_sample_size,
+ dtype=torch.long,
+ device=queries.device,
+ )
+
+ midx = torch.ones(
+ curr_sample_size,
+ dtype=torch.bool,
+ device=queries.device,
+ )
+
+ idx[:pcd_size] = torch.arange(
+ pcd_size, device=queries.device
+ )
+
+ midx[:pcd_size] = False # attend to first points
+ else:
+ # we have more points in pcd as we like to sample
+ # take a subset (no padding or masking needed)
+ idx = torch.randperm(
+ decomposed_aux[k].shape[0], device=queries.device
+ )[:curr_sample_size]
+ midx = torch.zeros(
+ curr_sample_size,
+ dtype=torch.bool,
+ device=queries.device,
+ ) # attend to all
+
+ rand_idx.append(idx)
+ mask_idx.append(midx)
+
+ batched_aux = torch.stack(
+ [
+ decomposed_aux[k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_attn = torch.stack(
+ [
+ decomposed_attn[k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_pos_enc = torch.stack(
+ [
+ pos_encodings_pcd[hlevel][0][k][rand_idx[k], :]
+ for k in range(len(rand_idx))
+ ]
+ )
+
+ batched_attn.permute((0, 2, 1))[
+ batched_attn.sum(1) == rand_idx[0].shape[0]
+ ] = False
+
+ m = torch.stack(mask_idx)
+ batched_attn = torch.logical_or(batched_attn, m[..., None])
+
+ src_pcd = self.lin_squeeze[decoder_counter][i](
+ batched_aux.permute((1, 0, 2))
+ )
+ if self.use_level_embed:
+ src_pcd += self.level_embed.weight[i]
+
+ output = self.cross_attention[decoder_counter][i](
+ queries.permute((1, 0, 2)),
+ src_pcd,
+ memory_mask=batched_attn.repeat_interleave(
+ self.num_heads, dim=0
+ ).permute((0, 2, 1)),
+ memory_key_padding_mask=None, # here we do not apply masking on padded region
+ pos=batched_pos_enc.permute((1, 0, 2)),
+ query_pos=query_pos,
+ )
+
+ output = self.self_attention[decoder_counter][i](
+ output,
+ tgt_mask=None,
+ tgt_key_padding_mask=None,
+ query_pos=query_pos,
+ )
+
+ # FFN
+ queries = self.ffn_attention[decoder_counter][i](
+ output
+ ).permute((1, 0, 2))
+
+ predictions_class.append(output_class)
+ predictions_mask.append(outputs_mask)
+
+ if point2segment is not None:
+ output_class, outputs_mask = self.mask_module(
+ queries,
+ mask_features,
+ mask_segments,
+ 0,
+ ret_attn_mask=False,
+ point2segment=point2segment,
+ coords=coords,
+ )
+ else:
+ output_class, outputs_mask = self.mask_module(
+ queries,
+ mask_features,
+ None,
+ 0,
+ ret_attn_mask=False,
+ point2segment=None,
+ coords=coords,
+ )
+ predictions_class.append(output_class)
+ predictions_mask.append(outputs_mask)
+
+ return {
+ "pred_logits": predictions_class[-1],
+ "pred_masks": predictions_mask[-1],
+ "aux_outputs": self._set_aux_loss(
+ predictions_class, predictions_mask
+ ),
+ "sampled_coords": sampled_coords.detach().cpu().numpy()
+ if sampled_coords is not None
+ else None,
+ "backbone_features": pcd_features,
+ }
+
+ def mask_module(
+ self,
+ query_feat,
+ mask_features,
+ mask_segments,
+ num_pooling_steps,
+ ret_attn_mask=True,
+ point2segment=None,
+ coords=None,
+ ):
+ query_feat = self.decoder_norm(query_feat)
+ mask_embed = self.mask_embed_head(query_feat)
+ outputs_class = self.class_embed_head(query_feat)
+
+ output_masks = []
+
+ if point2segment is not None:
+ output_segments = []
+ for i in range(len(mask_segments)):
+ output_segments.append(mask_segments[i] @ mask_embed[i].T)
+ output_masks.append(output_segments[-1][point2segment[i]])
+ else:
+ for i in range(mask_features.C[-1, 0] + 1):
+ output_masks.append(
+ mask_features.decomposed_features[i] @ mask_embed[i].T
+ )
+
+ output_masks = torch.cat(output_masks)
+ outputs_mask = me.SparseTensor(
+ features=output_masks,
+ coordinate_manager=mask_features.coordinate_manager,
+ coordinate_map_key=mask_features.coordinate_map_key,
+ )
+
+ if ret_attn_mask:
+ attn_mask = outputs_mask
+ for _ in range(num_pooling_steps):
+ attn_mask = self.pooling(attn_mask.float())
+
+ attn_mask = me.SparseTensor(
+ features=(attn_mask.F.detach().sigmoid() < 0.5),
+ coordinate_manager=attn_mask.coordinate_manager,
+ coordinate_map_key=attn_mask.coordinate_map_key,
+ )
+
+ if point2segment is not None:
+ return outputs_class, output_segments, attn_mask
+ else:
+ return (
+ outputs_class,
+ outputs_mask.decomposed_features,
+ attn_mask,
+ )
+
+ if point2segment is not None:
+ return outputs_class, output_segments
+ else:
+ return outputs_class, outputs_mask.decomposed_features
+
+ @torch.jit.unused
+ def _set_aux_loss(self, outputs_class, outputs_seg_masks):
+ # this is a workaround to make torchscript happy, as torchscript
+ # doesn't support dictionary with non-homogeneous values, such
+ # as a dict having both a Tensor and a list.
+ return [
+ {"pred_logits": a, "pred_masks": b}
+ for a, b in zip(outputs_class[:-1], outputs_seg_masks[:-1])
+ ]
+
+
+class PositionalEncoding3D(nn.Module):
+ def __init__(self, channels):
+ """
+ :param channels: The last dimension of the tensor you want to apply pos emb to.
+ """
+ self.orig_ch = channels
+ super(PositionalEncoding3D, self).__init__()
+ channels = int(np.ceil(channels / 6) * 2)
+ if channels % 2:
+ channels += 1
+ self.channels = channels
+ inv_freq = 1.0 / (
+ 10000 ** (torch.arange(0, channels, 2).float() / channels)
+ )
+ self.register_buffer("inv_freq", inv_freq)
+
+ def forward(self, tensor, input_range=None):
+ """
+ :param tensor: A 5d tensor of size (batch_size, x, y, z, ch)
+ :return: Positional Encoding Matrix of size (batch_size, x, y, z, ch)
+ """
+ pos_x, pos_y, pos_z = tensor[:, :, 0], tensor[:, :, 1], tensor[:, :, 2]
+ sin_inp_x = torch.einsum("bi,j->bij", pos_x, self.inv_freq)
+ sin_inp_y = torch.einsum("bi,j->bij", pos_y, self.inv_freq)
+ sin_inp_z = torch.einsum("bi,j->bij", pos_z, self.inv_freq)
+ emb_x = torch.cat((sin_inp_x.sin(), sin_inp_x.cos()), dim=-1)
+
+ emb_y = torch.cat((sin_inp_y.sin(), sin_inp_y.cos()), dim=-1)
+ emb_z = torch.cat((sin_inp_z.sin(), sin_inp_z.cos()), dim=-1)
+
+ emb = torch.cat((emb_x, emb_y, emb_z), dim=-1)
+ return emb[:, :, : self.orig_ch].permute((0, 2, 1))
+
+
+class SelfAttentionLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ nhead,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+
+ self.norm = nn.LayerNorm(d_model)
+ self.dropout = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ q = k = self.with_pos_embed(tgt, query_pos)
+ tgt2 = self.self_attn(
+ q,
+ k,
+ value=tgt,
+ attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+
+ return tgt
+
+ def forward_pre(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ tgt2 = self.norm(tgt)
+ q = k = self.with_pos_embed(tgt2, query_pos)
+ tgt2 = self.self_attn(
+ q,
+ k,
+ value=tgt2,
+ attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+
+ return tgt
+
+ def forward(
+ self, tgt, tgt_mask=None, tgt_key_padding_mask=None, query_pos=None
+ ):
+ if self.normalize_before:
+ return self.forward_pre(
+ tgt, tgt_mask, tgt_key_padding_mask, query_pos
+ )
+ return self.forward_post(
+ tgt, tgt_mask, tgt_key_padding_mask, query_pos
+ )
+
+
+class CrossAttentionLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ nhead,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ self.multihead_attn = nn.MultiheadAttention(
+ d_model, nhead, dropout=dropout
+ )
+
+ self.norm = nn.LayerNorm(d_model)
+ self.dropout = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ tgt2 = self.multihead_attn(
+ query=self.with_pos_embed(tgt, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory,
+ attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+
+ return tgt
+
+ def forward_pre(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ tgt2 = self.norm(tgt)
+
+ tgt2 = self.multihead_attn(
+ query=self.with_pos_embed(tgt2, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory,
+ attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask,
+ )[0]
+ tgt = tgt + self.dropout(tgt2)
+
+ return tgt
+
+ def forward(
+ self,
+ tgt,
+ memory,
+ memory_mask=None,
+ memory_key_padding_mask=None,
+ pos=None,
+ query_pos=None,
+ ):
+ if self.normalize_before:
+ return self.forward_pre(
+ tgt,
+ memory,
+ memory_mask,
+ memory_key_padding_mask,
+ pos,
+ query_pos,
+ )
+ return self.forward_post(
+ tgt, memory, memory_mask, memory_key_padding_mask, pos, query_pos
+ )
+
+
+class FFNLayer(nn.Module):
+ def __init__(
+ self,
+ d_model,
+ dim_feedforward=2048,
+ dropout=0.0,
+ activation="relu",
+ normalize_before=False,
+ ):
+ super().__init__()
+ # Implementation of Feedforward model
+ self.linear1 = nn.Linear(d_model, dim_feedforward)
+ self.dropout = nn.Dropout(dropout)
+ self.linear2 = nn.Linear(dim_feedforward, d_model)
+
+ self.norm = nn.LayerNorm(d_model)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def with_pos_embed(self, tensor, pos):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(self, tgt):
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
+ tgt = tgt + self.dropout(tgt2)
+ tgt = self.norm(tgt)
+ return tgt
+
+ def forward_pre(self, tgt):
+ tgt2 = self.norm(tgt)
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
+ tgt = tgt + self.dropout(tgt2)
+ return tgt
+
+ def forward(self, tgt):
+ if self.normalize_before:
+ return self.forward_pre(tgt)
+ return self.forward_post(tgt)
+
+
+def _get_activation_fn(activation):
+ """Return an activation function given a string"""
+ if activation == "relu":
+ return F.relu
+ if activation == "gelu":
+ return F.gelu
+ if activation == "glu":
+ return F.glu
+ raise RuntimeError(f"activation should be relu/gelu, not {activation}.")
diff --git a/models/Mask3D/mask3d/models/matcher.py b/models/Mask3D/mask3d/models/matcher.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc0e7a05bb76a078b1c3c3b9c877054e439b584c
--- /dev/null
+++ b/models/Mask3D/mask3d/models/matcher.py
@@ -0,0 +1,226 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/models/matcher.py
+"""
+Modules to compute the matching cost and solve the corresponding LSAP.
+"""
+import torch
+import torch.nn.functional as F
+from scipy.optimize import linear_sum_assignment
+from torch import nn
+from torch.cuda.amp import autocast
+
+from detectron2.projects.point_rend.point_features import point_sample
+
+
+def batch_dice_loss(inputs: torch.Tensor, targets: torch.Tensor):
+ """
+ Compute the DICE loss, similar to generalized IOU for masks
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ """
+ inputs = inputs.sigmoid()
+ inputs = inputs.flatten(1)
+ numerator = 2 * torch.einsum("nc,mc->nm", inputs, targets)
+ denominator = inputs.sum(-1)[:, None] + targets.sum(-1)[None, :]
+ loss = 1 - (numerator + 1) / (denominator + 1)
+ return loss
+
+
+batch_dice_loss_jit = torch.jit.script(
+ batch_dice_loss
+) # type: torch.jit.ScriptModule
+
+
+def batch_sigmoid_ce_loss(inputs: torch.Tensor, targets: torch.Tensor):
+ """
+ Args:
+ inputs: A float tensor of arbitrary shape.
+ The predictions for each example.
+ targets: A float tensor with the same shape as inputs. Stores the binary
+ classification label for each element in inputs
+ (0 for the negative class and 1 for the positive class).
+ Returns:
+ Loss tensor
+ """
+ hw = inputs.shape[1]
+
+ pos = F.binary_cross_entropy_with_logits(
+ inputs, torch.ones_like(inputs), reduction="none"
+ )
+ neg = F.binary_cross_entropy_with_logits(
+ inputs, torch.zeros_like(inputs), reduction="none"
+ )
+
+ loss = torch.einsum("nc,mc->nm", pos, targets) + torch.einsum(
+ "nc,mc->nm", neg, (1 - targets)
+ )
+
+ return loss / hw
+
+
+batch_sigmoid_ce_loss_jit = torch.jit.script(
+ batch_sigmoid_ce_loss
+) # type: torch.jit.ScriptModule
+
+
+class HungarianMatcher(nn.Module):
+ """This class computes an assignment between the targets and the predictions of the network
+
+ For efficiency reasons, the targets don't include the no_object. Because of this, in general,
+ there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
+ while the others are un-matched (and thus treated as non-objects).
+ """
+
+ def __init__(
+ self,
+ cost_class: float = 1,
+ cost_mask: float = 1,
+ cost_dice: float = 1,
+ num_points: int = 0,
+ ):
+ """Creates the matcher
+
+ Params:
+ cost_class: This is the relative weight of the classification error in the matching cost
+ cost_mask: This is the relative weight of the focal loss of the binary mask in the matching cost
+ cost_dice: This is the relative weight of the dice loss of the binary mask in the matching cost
+ """
+ super().__init__()
+ self.cost_class = cost_class
+ self.cost_mask = cost_mask
+ self.cost_dice = cost_dice
+
+ assert (
+ cost_class != 0 or cost_mask != 0 or cost_dice != 0
+ ), "all costs cant be 0"
+
+ self.num_points = num_points
+
+ @torch.no_grad()
+ def memory_efficient_forward(self, outputs, targets, mask_type):
+ """More memory-friendly matching"""
+ bs, num_queries = outputs["pred_logits"].shape[:2]
+
+ indices = []
+
+ # Iterate through batch size
+ for b in range(bs):
+
+ out_prob = outputs["pred_logits"][b].softmax(
+ -1
+ ) # [num_queries, num_classes]
+ tgt_ids = targets[b]["labels"].clone()
+
+ # Compute the classification cost. Contrary to the loss, we don't use the NLL,
+ # but approximate it in 1 - proba[target class].
+ # The 1 is a constant that doesn't change the matching, it can be ommitted.
+ filter_ignore = tgt_ids == 253
+ tgt_ids[filter_ignore] = 0
+ cost_class = -out_prob[:, tgt_ids]
+ cost_class[
+ :, filter_ignore
+ ] = (
+ -1.0
+ ) # for ignore classes pretend perfect match ;) TODO better worst class match?
+
+ out_mask = outputs["pred_masks"][
+ b
+ ].T # [num_queries, H_pred, W_pred]
+ # gt masks are already padded when preparing target
+ tgt_mask = targets[b][mask_type].to(out_mask)
+
+ if self.num_points != -1:
+ point_idx = torch.randperm(
+ tgt_mask.shape[1], device=tgt_mask.device
+ )[: int(self.num_points * tgt_mask.shape[1])]
+ # point_idx = torch.randint(0, tgt_mask.shape[1], size=(self.num_points,), device=tgt_mask.device)
+ else:
+ # sample all points
+ point_idx = torch.arange(
+ tgt_mask.shape[1], device=tgt_mask.device
+ )
+
+ # out_mask = out_mask[:, None]
+ # tgt_mask = tgt_mask[:, None]
+ # all masks share the same set of points for efficient matching!
+ # point_coords = torch.rand(1, self.num_points, 2, device=out_mask.device)
+ # get gt labels
+ # tgt_mask = point_sample(
+ # tgt_mask,
+ # point_coords.repeat(tgt_mask.shape[0], 1, 1),
+ # align_corners=False,
+ # ).squeeze(1)
+
+ # out_mask = point_sample(
+ # out_mask,
+ # point_coords.repeat(out_mask.shape[0], 1, 1),
+ # align_corners=False,
+ # ).squeeze(1)
+
+ with autocast(enabled=False):
+ out_mask = out_mask.float()
+ tgt_mask = tgt_mask.float()
+ # Compute the focal loss between masks
+ cost_mask = batch_sigmoid_ce_loss_jit(
+ out_mask[:, point_idx], tgt_mask[:, point_idx]
+ )
+
+ # Compute the dice loss betwen masks
+ cost_dice = batch_dice_loss_jit(
+ out_mask[:, point_idx], tgt_mask[:, point_idx]
+ )
+
+ # Final cost matrix
+ C = (
+ self.cost_mask * cost_mask
+ + self.cost_class * cost_class
+ + self.cost_dice * cost_dice
+ )
+ C = C.reshape(num_queries, -1).cpu()
+
+ indices.append(linear_sum_assignment(C))
+
+ return [
+ (
+ torch.as_tensor(i, dtype=torch.int64),
+ torch.as_tensor(j, dtype=torch.int64),
+ )
+ for i, j in indices
+ ]
+
+ @torch.no_grad()
+ def forward(self, outputs, targets, mask_type):
+ """Performs the matching
+
+ Params:
+ outputs: This is a dict that contains at least these entries:
+ "pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
+ "pred_masks": Tensor of dim [batch_size, num_queries, H_pred, W_pred] with the predicted masks
+
+ targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
+ "labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
+ objects in the target) containing the class labels
+ "masks": Tensor of dim [num_target_boxes, H_gt, W_gt] containing the target masks
+
+ Returns:
+ A list of size batch_size, containing tuples of (index_i, index_j) where:
+ - index_i is the indices of the selected predictions (in order)
+ - index_j is the indices of the corresponding selected targets (in order)
+ For each batch element, it holds:
+ len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
+ """
+ return self.memory_efficient_forward(outputs, targets, mask_type)
+
+ def __repr__(self, _repr_indent=4):
+ head = "Matcher " + self.__class__.__name__
+ body = [
+ "cost_class: {}".format(self.cost_class),
+ "cost_mask: {}".format(self.cost_mask),
+ "cost_dice: {}".format(self.cost_dice),
+ ]
+ lines = [head] + [" " * _repr_indent + line for line in body]
+ return "\n".join(lines)
diff --git a/models/Mask3D/mask3d/models/metrics/__init__.py b/models/Mask3D/mask3d/models/metrics/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd7538b5868b93e4192dbee9ca0da9e91323cf0f
--- /dev/null
+++ b/models/Mask3D/mask3d/models/metrics/__init__.py
@@ -0,0 +1,4 @@
+from .confusionmatrix import ConfusionMatrix
+from .metrics import IoU
+
+__all__ = ["ConfusionMatrix", "IoU"]
diff --git a/models/Mask3D/mask3d/models/metrics/confusionmatrix.py b/models/Mask3D/mask3d/models/metrics/confusionmatrix.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d92f12595d26f76f3c26d18550b1b1486b837ff
--- /dev/null
+++ b/models/Mask3D/mask3d/models/metrics/confusionmatrix.py
@@ -0,0 +1,107 @@
+import numpy as np
+import torch
+
+
+class ConfusionMatrix:
+ """Constructs a confusion matrix for a multi-class classification problems.
+
+ Does not support multi-label, multi-class problems.
+
+ Keyword arguments:
+ - num_classes (int): number of classes in the classification problem.
+ - normalized (boolean, optional): Determines whether or not the confusion
+ matrix is normalized or not. Default: False.
+
+ Modified from: https://github.com/pytorch/tnt/blob/master/torchnet/meter/confusionmeter.py
+ """
+
+ def __init__(self, num_classes, ignore_label):
+ super().__init__()
+
+ self.conf = np.ndarray((num_classes, num_classes), dtype=np.int32)
+ self.ignore_label = ignore_label
+ self.num_classes = num_classes
+ self.reset()
+
+ def reset(self):
+ self.conf.fill(0)
+
+ def add(self, predicted, target):
+ """Computes the confusion matrix
+
+ The shape of the confusion matrix is K x K, where K is the number
+ of classes.
+
+ Keyword arguments:
+ - predicted (Tensor or numpy.ndarray): Can be an N x K tensor/array of
+ predicted scores obtained from the model for N examples and K classes,
+ or an N-tensor/array of integer values between 0 and K-1.
+ - target (Tensor or numpy.ndarray): Can be an N x K tensor/array of
+ ground-truth classes for N examples and K classes, or an N-tensor/array
+ of integer values between 0 and K-1.
+
+ """
+ # _, predicted = predicted.max(1)
+
+ # predicted = predicted.view(-1)
+ # target = target.view(-1)
+
+ # If target and/or predicted are tensors, convert them to numpy arrays
+ if torch.is_tensor(predicted):
+ predicted = predicted.cpu().numpy()
+ if torch.is_tensor(target):
+ target = target.cpu().numpy()
+ ind = ~np.isin(target, self.ignore_label)
+ predicted, target = predicted[ind], target[ind]
+
+ assert (
+ predicted.shape[0] == target.shape[0]
+ ), "number of targets and predicted outputs do not match"
+
+ if np.ndim(predicted) != 1:
+ assert (
+ predicted.shape[1] == self.num_classes
+ ), "number of predictions does not match size of confusion matrix"
+ predicted = np.argmax(predicted, 1)
+ else:
+ assert (predicted.max() < self.num_classes) and (
+ predicted.min() >= 0
+ ), "predicted values are not between 0 and k-1"
+
+ if np.ndim(target) != 1:
+ assert (
+ target.shape[1] == self.num_classes
+ ), "Onehot target does not match size of confusion matrix"
+ assert (target >= 0).all() and (
+ target <= 1
+ ).all(), "in one-hot encoding, target values should be 0 or 1"
+ assert (
+ target.sum(1) == 1
+ ).all(), "multi-label setting is not supported"
+ target = np.argmax(target, 1)
+ else:
+ assert (target.max() < self.num_classes) and (
+ target.min() >= 0
+ ), "target values are not between 0 and k-1"
+
+ # hack for bincounting 2 arrays together
+ x = predicted + self.num_classes * target
+ bincount_2d = np.bincount(
+ x.astype(np.int32), minlength=self.num_classes**2
+ )
+ assert bincount_2d.size == self.num_classes**2
+ conf = bincount_2d.reshape((self.num_classes, self.num_classes))
+
+ self.conf += conf
+
+ def value(self, normalized=False):
+ """
+ Returns:
+ Confustion matrix of K rows and K columns, where rows corresponds
+ to ground-truth targets and columns corresponds to predicted
+ targets.
+ """
+ if normalized:
+ conf = self.conf.astype(np.float32)
+ return conf / conf.sum(1).clip(min=1e-12)[:, None]
+ return self.conf
diff --git a/models/Mask3D/mask3d/models/metrics/metrics.py b/models/Mask3D/mask3d/models/metrics/metrics.py
new file mode 100644
index 0000000000000000000000000000000000000000..f3f4b0ca4f7b0c5224ea242f459374a28485539f
--- /dev/null
+++ b/models/Mask3D/mask3d/models/metrics/metrics.py
@@ -0,0 +1,48 @@
+import numpy as np
+
+
+class IoU:
+ """Computes the intersection over union (IoU) per class and corresponding
+ mean (mIoU).
+
+ Intersection over union (IoU) is a common evaluation metric for semantic
+ segmentation. The predictions are first accumulated in a confusion matrix
+ and the IoU is computed from it as follows:
+
+ IoU = true_positive / (true_positive + false_positive + false_negative).
+
+ Keyword arguments:
+ - num_classes (int): number of classes in the classification problem
+ - normalized (boolean, optional): Determines whether or not the confusion
+ matrix is normalized or not. Default: False.
+ - ignore_index (int or iterable, optional): Index of the classes to ignore
+ when computing the IoU. Can be an int, or any iterable of ints.
+
+ Modified from: https://github.com/pytorch/tnt/blob/master/torchnet/meter
+
+ """
+
+ def __init__(self):
+ super().__init__()
+
+ def value(self, conf_matrix):
+ """Computes the IoU and mean IoU.
+
+ The mean computation ignores NaN elements of the IoU array.
+
+ Returns:
+ Tuple: (IoU, mIoU). The first output is the per class IoU,
+ for K classes it's numpy.ndarray with K elements. The second output,
+ is the mean IoU.
+ """
+ true_positive = np.diag(conf_matrix)
+ false_positive = np.sum(conf_matrix, 0) - true_positive
+ false_negative = np.sum(conf_matrix, 1) - true_positive
+
+ # Just in case we get a division by 0, ignore/hide the error
+ with np.errstate(divide="ignore", invalid="ignore"):
+ iou = true_positive / (
+ true_positive + false_positive + false_negative
+ )
+
+ return iou
diff --git a/models/Mask3D/mask3d/models/misc.py b/models/Mask3D/mask3d/models/misc.py
new file mode 100644
index 0000000000000000000000000000000000000000..8416b62804fbc002bd02a457d896276bc307b070
--- /dev/null
+++ b/models/Mask3D/mask3d/models/misc.py
@@ -0,0 +1,119 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Modified by Bowen Cheng from https://github.com/facebookresearch/detr/blob/master/util/misc.py
+"""
+Misc functions, including distributed helpers.
+
+Mostly copy-paste from torchvision references.
+"""
+from typing import List, Optional
+
+import torch
+import torch.distributed as dist
+import torchvision
+from torch import Tensor
+
+
+def _max_by_axis(the_list):
+ # type: (List[List[int]]) -> List[int]
+ maxes = the_list[0]
+ for sublist in the_list[1:]:
+ for index, item in enumerate(sublist):
+ maxes[index] = max(maxes[index], item)
+ return maxes
+
+
+class NestedTensor(object):
+ def __init__(self, tensors, mask: Optional[Tensor]):
+ self.tensors = tensors
+ self.mask = mask
+
+ def to(self, device):
+ # type: (Device) -> NestedTensor # noqa
+ cast_tensor = self.tensors.to(device)
+ mask = self.mask
+ if mask is not None:
+ assert mask is not None
+ cast_mask = mask.to(device)
+ else:
+ cast_mask = None
+ return NestedTensor(cast_tensor, cast_mask)
+
+ def decompose(self):
+ return self.tensors, self.mask
+
+ def __repr__(self):
+ return str(self.tensors)
+
+
+def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
+ # TODO make this more general
+ if tensor_list[0].ndim == 3:
+ if torchvision._is_tracing():
+ # nested_tensor_from_tensor_list() does not export well to ONNX
+ # call _onnx_nested_tensor_from_tensor_list() instead
+ return _onnx_nested_tensor_from_tensor_list(tensor_list)
+
+ # TODO make it support different-sized images
+ max_size = _max_by_axis([list(img.shape) for img in tensor_list])
+ # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))
+ batch_shape = [len(tensor_list)] + max_size
+ b, c, h, w = batch_shape
+ dtype = tensor_list[0].dtype
+ device = tensor_list[0].device
+ tensor = torch.zeros(batch_shape, dtype=dtype, device=device)
+ mask = torch.ones((b, h, w), dtype=torch.bool, device=device)
+ for img, pad_img, m in zip(tensor_list, tensor, mask):
+ pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ m[: img.shape[1], : img.shape[2]] = False
+ else:
+ raise ValueError("not supported")
+ return NestedTensor(tensor, mask)
+
+
+# _onnx_nested_tensor_from_tensor_list() is an implementation of
+# nested_tensor_from_tensor_list() that is supported by ONNX tracing.
+@torch.jit.unused
+def _onnx_nested_tensor_from_tensor_list(
+ tensor_list: List[Tensor],
+) -> NestedTensor:
+ max_size = []
+ for i in range(tensor_list[0].dim()):
+ max_size_i = torch.max(
+ torch.stack([img.shape[i] for img in tensor_list]).to(
+ torch.float32
+ )
+ ).to(torch.int64)
+ max_size.append(max_size_i)
+ max_size = tuple(max_size)
+
+ # work around for
+ # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ # m[: img.shape[1], :img.shape[2]] = False
+ # which is not yet supported in onnx
+ padded_imgs = []
+ padded_masks = []
+ for img in tensor_list:
+ padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]
+ padded_img = torch.nn.functional.pad(
+ img, (0, padding[2], 0, padding[1], 0, padding[0])
+ )
+ padded_imgs.append(padded_img)
+
+ m = torch.zeros_like(img[0], dtype=torch.int, device=img.device)
+ padded_mask = torch.nn.functional.pad(
+ m, (0, padding[2], 0, padding[1]), "constant", 1
+ )
+ padded_masks.append(padded_mask.to(torch.bool))
+
+ tensor = torch.stack(padded_imgs)
+ mask = torch.stack(padded_masks)
+
+ return NestedTensor(tensor, mask=mask)
+
+
+def is_dist_avail_and_initialized():
+ if not dist.is_available():
+ return False
+ if not dist.is_initialized():
+ return False
+ return True
diff --git a/models/Mask3D/mask3d/models/model.py b/models/Mask3D/mask3d/models/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..d167fa58358f2c1a7ca4a509e38c61906e9dd7ac
--- /dev/null
+++ b/models/Mask3D/mask3d/models/model.py
@@ -0,0 +1,27 @@
+from MinkowskiEngine import MinkowskiNetwork
+
+
+class Model(MinkowskiNetwork):
+ """
+ Base network for all sparse convnet
+
+ By default, all networks are segmentation networks.
+ """
+
+ OUT_PIXEL_DIST = -1
+
+ def __init__(self, in_channels, out_channels, config, D, **kwargs):
+ super().__init__(D)
+ self.in_channels = in_channels
+ self.out_channels = out_channels
+ self.config = config
+
+
+class HighDimensionalModel(Model):
+ """
+ Base network for all spatio (temporal) chromatic sparse convnet
+ """
+
+ def __init__(self, in_channels, out_channels, config, D, **kwargs):
+ assert D > 4, "Num dimension smaller than 5"
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
diff --git a/models/Mask3D/mask3d/models/modules/3detr_helpers.py b/models/Mask3D/mask3d/models/modules/3detr_helpers.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c3f7ea57c0266a9781cdfec9f59896d15750a9d
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/3detr_helpers.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import torch.nn as nn
+from functools import partial
+import copy
+
+
+class BatchNormDim1Swap(nn.BatchNorm1d):
+ """
+ Used for nn.Transformer that uses a HW x N x C rep
+ """
+
+ def forward(self, x):
+ """
+ x: HW x N x C
+ permute to N x C x HW
+ Apply BN on C
+ permute back
+ """
+ hw, n, c = x.shape
+ x = x.permute(1, 2, 0)
+ x = super(BatchNormDim1Swap, self).forward(x)
+ # x: n x c x hw -> hw x n x c
+ x = x.permute(2, 0, 1)
+ return x
+
+
+NORM_DICT = {
+ "bn": BatchNormDim1Swap,
+ "bn1d": nn.BatchNorm1d,
+ "id": nn.Identity,
+ "ln": nn.LayerNorm,
+}
+
+ACTIVATION_DICT = {
+ "relu": nn.ReLU,
+ "gelu": nn.GELU,
+ "leakyrelu": partial(nn.LeakyReLU, negative_slope=0.1),
+}
+
+WEIGHT_INIT_DICT = {
+ "xavier_uniform": nn.init.xavier_uniform_,
+}
+
+
+class GenericMLP(nn.Module):
+ def __init__(
+ self,
+ input_dim,
+ hidden_dims,
+ output_dim,
+ norm_fn_name=None,
+ activation="relu",
+ use_conv=False,
+ dropout=None,
+ hidden_use_bias=False,
+ output_use_bias=True,
+ output_use_activation=False,
+ output_use_norm=False,
+ weight_init_name=None,
+ ):
+ super().__init__()
+ activation = ACTIVATION_DICT[activation]
+ norm = None
+ if norm_fn_name is not None:
+ norm = NORM_DICT[norm_fn_name]
+ if norm_fn_name == "ln" and use_conv:
+ norm = lambda x: nn.GroupNorm(1, x) # easier way to use LayerNorm
+
+ if dropout is not None:
+ if not isinstance(dropout, list):
+ dropout = [dropout for _ in range(len(hidden_dims))]
+
+ layers = []
+ prev_dim = input_dim
+ for idx, x in enumerate(hidden_dims):
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, x, 1, bias=hidden_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, x, bias=hidden_use_bias)
+ layers.append(layer)
+ if norm:
+ layers.append(norm(x))
+ layers.append(activation())
+ if dropout is not None:
+ layers.append(nn.Dropout(p=dropout[idx]))
+ prev_dim = x
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, output_dim, 1, bias=output_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, output_dim, bias=output_use_bias)
+ layers.append(layer)
+
+ if output_use_norm:
+ layers.append(norm(output_dim))
+
+ if output_use_activation:
+ layers.append(activation())
+
+ self.layers = nn.Sequential(*layers)
+
+ if weight_init_name is not None:
+ self.do_weight_init(weight_init_name)
+
+ def do_weight_init(self, weight_init_name):
+ func = WEIGHT_INIT_DICT[weight_init_name]
+ for (_, param) in self.named_parameters():
+ if param.dim() > 1: # skips batchnorm/layernorm
+ func(param)
+
+ def forward(self, x):
+ output = self.layers(x)
+ return output
+
+
+def get_clones(module, N):
+ return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
diff --git a/models/Mask3D/mask3d/models/modules/__init__.py b/models/Mask3D/mask3d/models/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/models/modules/common.py b/models/Mask3D/mask3d/models/modules/common.py
new file mode 100644
index 0000000000000000000000000000000000000000..ae78b5b301cfd6ffcfc3417b543ebe2289602fb7
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/common.py
@@ -0,0 +1,275 @@
+import sys
+
+if sys.version_info[:2] >= (3, 8):
+ from collections.abc import Sequence
+else:
+ from collections import Sequence
+
+from enum import Enum
+
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+
+class NormType(Enum):
+ BATCH_NORM = 0
+ INSTANCE_NORM = 1
+ INSTANCE_BATCH_NORM = 2
+
+
+def get_norm(norm_type, n_channels, D, bn_momentum=0.1):
+ if norm_type == NormType.BATCH_NORM:
+ return ME.MinkowskiBatchNorm(n_channels, momentum=bn_momentum)
+ elif norm_type == NormType.INSTANCE_NORM:
+ return ME.MinkowskiInstanceNorm(n_channels)
+ elif norm_type == NormType.INSTANCE_BATCH_NORM:
+ return nn.Sequential(
+ ME.MinkowskiInstanceNorm(n_channels),
+ ME.MinkowskiBatchNorm(n_channels, momentum=bn_momentum),
+ )
+ else:
+ raise ValueError(f"Norm type: {norm_type} not supported")
+
+
+class ConvType(Enum):
+ """
+ Define the kernel region type
+ """
+
+ HYPERCUBE = 0, "HYPERCUBE"
+ SPATIAL_HYPERCUBE = 1, "SPATIAL_HYPERCUBE"
+ SPATIO_TEMPORAL_HYPERCUBE = 2, "SPATIO_TEMPORAL_HYPERCUBE"
+ HYPERCROSS = 3, "HYPERCROSS"
+ SPATIAL_HYPERCROSS = 4, "SPATIAL_HYPERCROSS"
+ SPATIO_TEMPORAL_HYPERCROSS = 5, "SPATIO_TEMPORAL_HYPERCROSS"
+ SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS = (
+ 6,
+ "SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS ",
+ )
+
+ def __new__(cls, value, name):
+ member = object.__new__(cls)
+ member._value_ = value
+ member.fullname = name
+ return member
+
+ def __int__(self):
+ return self.value
+
+
+# Convert the ConvType var to a RegionType var
+conv_to_region_type = {
+ # kernel_size = [k, k, k, 1]
+ ConvType.HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.SPATIAL_HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.SPATIO_TEMPORAL_HYPERCUBE: ME.RegionType.HYPER_CUBE,
+ ConvType.HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIAL_HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIO_TEMPORAL_HYPERCROSS: ME.RegionType.HYPER_CROSS,
+ ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS: ME.RegionType.HYPER_CUBE, # JONAS CHANGE from HYBRID
+}
+
+# int_to_region_type = {m.value: m for m in ME.RegionType}
+int_to_region_type = {m: ME.RegionType(m) for m in range(3)}
+
+
+def convert_region_type(region_type):
+ """
+ Convert the integer region_type to the corresponding RegionType enum object.
+ """
+ return int_to_region_type[region_type]
+
+
+def convert_conv_type(conv_type, kernel_size, D):
+ assert isinstance(conv_type, ConvType), "conv_type must be of ConvType"
+ region_type = conv_to_region_type[conv_type]
+ axis_types = None
+ if conv_type == ConvType.SPATIAL_HYPERCUBE:
+ # No temporal convolution
+ if isinstance(kernel_size, Sequence):
+ kernel_size = kernel_size[:3]
+ else:
+ kernel_size = [
+ kernel_size,
+ ] * 3
+ if D == 4:
+ kernel_size.append(1)
+ elif conv_type == ConvType.SPATIO_TEMPORAL_HYPERCUBE:
+ # conv_type conversion already handled
+ assert D == 4
+ elif conv_type == ConvType.HYPERCUBE:
+ # conv_type conversion already handled
+ pass
+ elif conv_type == ConvType.SPATIAL_HYPERCROSS:
+ if isinstance(kernel_size, Sequence):
+ kernel_size = kernel_size[:3]
+ else:
+ kernel_size = [
+ kernel_size,
+ ] * 3
+ if D == 4:
+ kernel_size.append(1)
+ elif conv_type == ConvType.HYPERCROSS:
+ # conv_type conversion already handled
+ pass
+ elif conv_type == ConvType.SPATIO_TEMPORAL_HYPERCROSS:
+ # conv_type conversion already handled
+ assert D == 4
+ elif conv_type == ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS:
+ # Define the CUBIC conv kernel for spatial dims and CROSS conv for temp dim
+ axis_types = [
+ ME.RegionType.HYPER_CUBE,
+ ] * 3
+ if D == 4:
+ axis_types.append(ME.RegionType.HYPER_CROSS)
+ return region_type, axis_types, kernel_size
+
+
+def conv(
+ in_planes,
+ out_planes,
+ kernel_size,
+ stride=1,
+ dilation=1,
+ bias=False,
+ conv_type=ConvType.HYPERCUBE,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=None, # axis_types JONAS
+ dimension=D,
+ )
+
+ return ME.MinkowskiConvolution(
+ in_channels=in_planes,
+ out_channels=out_planes,
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ bias=bias,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def conv_tr(
+ in_planes,
+ out_planes,
+ kernel_size,
+ upsample_stride=1,
+ dilation=1,
+ bias=False,
+ conv_type=ConvType.HYPERCUBE,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ upsample_stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiConvolutionTranspose(
+ in_channels=in_planes,
+ out_channels=out_planes,
+ kernel_size=kernel_size,
+ stride=upsample_stride,
+ dilation=dilation,
+ bias=bias,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def avg_pool(
+ kernel_size,
+ stride=1,
+ dilation=1,
+ conv_type=ConvType.HYPERCUBE,
+ in_coords_key=None,
+ D=-1,
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiAvgPooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def avg_unpool(
+ kernel_size, stride=1, dilation=1, conv_type=ConvType.HYPERCUBE, D=-1
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiAvgUnpooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
+
+
+def sum_pool(
+ kernel_size, stride=1, dilation=1, conv_type=ConvType.HYPERCUBE, D=-1
+):
+ assert D > 0, "Dimension must be a positive integer"
+ region_type, axis_types, kernel_size = convert_conv_type(
+ conv_type, kernel_size, D
+ )
+ kernel_generator = ME.KernelGenerator(
+ kernel_size,
+ stride,
+ dilation,
+ region_type=region_type,
+ axis_types=axis_types,
+ dimension=D,
+ )
+
+ return ME.MinkowskiSumPooling(
+ kernel_size=kernel_size,
+ stride=stride,
+ dilation=dilation,
+ kernel_generator=kernel_generator,
+ dimension=D,
+ )
diff --git a/models/Mask3D/mask3d/models/modules/helpers_3detr.py b/models/Mask3D/mask3d/models/modules/helpers_3detr.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c3f7ea57c0266a9781cdfec9f59896d15750a9d
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/helpers_3detr.py
@@ -0,0 +1,116 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import torch.nn as nn
+from functools import partial
+import copy
+
+
+class BatchNormDim1Swap(nn.BatchNorm1d):
+ """
+ Used for nn.Transformer that uses a HW x N x C rep
+ """
+
+ def forward(self, x):
+ """
+ x: HW x N x C
+ permute to N x C x HW
+ Apply BN on C
+ permute back
+ """
+ hw, n, c = x.shape
+ x = x.permute(1, 2, 0)
+ x = super(BatchNormDim1Swap, self).forward(x)
+ # x: n x c x hw -> hw x n x c
+ x = x.permute(2, 0, 1)
+ return x
+
+
+NORM_DICT = {
+ "bn": BatchNormDim1Swap,
+ "bn1d": nn.BatchNorm1d,
+ "id": nn.Identity,
+ "ln": nn.LayerNorm,
+}
+
+ACTIVATION_DICT = {
+ "relu": nn.ReLU,
+ "gelu": nn.GELU,
+ "leakyrelu": partial(nn.LeakyReLU, negative_slope=0.1),
+}
+
+WEIGHT_INIT_DICT = {
+ "xavier_uniform": nn.init.xavier_uniform_,
+}
+
+
+class GenericMLP(nn.Module):
+ def __init__(
+ self,
+ input_dim,
+ hidden_dims,
+ output_dim,
+ norm_fn_name=None,
+ activation="relu",
+ use_conv=False,
+ dropout=None,
+ hidden_use_bias=False,
+ output_use_bias=True,
+ output_use_activation=False,
+ output_use_norm=False,
+ weight_init_name=None,
+ ):
+ super().__init__()
+ activation = ACTIVATION_DICT[activation]
+ norm = None
+ if norm_fn_name is not None:
+ norm = NORM_DICT[norm_fn_name]
+ if norm_fn_name == "ln" and use_conv:
+ norm = lambda x: nn.GroupNorm(1, x) # easier way to use LayerNorm
+
+ if dropout is not None:
+ if not isinstance(dropout, list):
+ dropout = [dropout for _ in range(len(hidden_dims))]
+
+ layers = []
+ prev_dim = input_dim
+ for idx, x in enumerate(hidden_dims):
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, x, 1, bias=hidden_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, x, bias=hidden_use_bias)
+ layers.append(layer)
+ if norm:
+ layers.append(norm(x))
+ layers.append(activation())
+ if dropout is not None:
+ layers.append(nn.Dropout(p=dropout[idx]))
+ prev_dim = x
+ if use_conv:
+ layer = nn.Conv1d(prev_dim, output_dim, 1, bias=output_use_bias)
+ else:
+ layer = nn.Linear(prev_dim, output_dim, bias=output_use_bias)
+ layers.append(layer)
+
+ if output_use_norm:
+ layers.append(norm(output_dim))
+
+ if output_use_activation:
+ layers.append(activation())
+
+ self.layers = nn.Sequential(*layers)
+
+ if weight_init_name is not None:
+ self.do_weight_init(weight_init_name)
+
+ def do_weight_init(self, weight_init_name):
+ func = WEIGHT_INIT_DICT[weight_init_name]
+ for (_, param) in self.named_parameters():
+ if param.dim() > 1: # skips batchnorm/layernorm
+ func(param)
+
+ def forward(self, x):
+ output = self.layers(x)
+ return output
+
+
+def get_clones(module, N):
+ return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
diff --git a/models/Mask3D/mask3d/models/modules/resnet_block.py b/models/Mask3D/mask3d/models/modules/resnet_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac16b72aa198964e343f57ad4f79193a22e830dc
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/resnet_block.py
@@ -0,0 +1,157 @@
+import torch.nn as nn
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.modules.common import ConvType, NormType, conv, get_norm
+
+
+class BasicBlockBase(nn.Module):
+ expansion = 1
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+
+ self.conv1 = conv(
+ inplanes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm1 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=1,
+ dilation=dilation,
+ bias=False,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class BasicBlock(BasicBlockBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BasicBlockIN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BasicBlockINBN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+
+
+class BottleneckBase(nn.Module):
+ expansion = 4
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+ self.conv1 = conv(inplanes, planes, kernel_size=1, D=D)
+ self.norm1 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(
+ self.NORM_TYPE, planes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv3 = conv(planes, planes * self.expansion, kernel_size=1, D=D)
+ self.norm3 = get_norm(
+ self.NORM_TYPE, planes * self.expansion, D, bn_momentum=bn_momentum
+ )
+
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.relu(out)
+
+ out = self.conv3(out)
+ out = self.norm3(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class Bottleneck(BottleneckBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BottleneckIN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BottleneckINBN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
diff --git a/models/Mask3D/mask3d/models/modules/resnet_block.py.tmp b/models/Mask3D/mask3d/models/modules/resnet_block.py.tmp
new file mode 100644
index 0000000000000000000000000000000000000000..00dba24b9ab660fd2fc2b6f2f88c508d0b62db0b
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/resnet_block.py.tmp
@@ -0,0 +1,149 @@
+import torch.nn as nn
+from MinkowskiEngine import MinkowskiReLU
+
+from mix3d.models.modules.common import ConvType, NormType, conv, get_norm
+
+
+class BasicBlockBase(nn.Module):
+ expansion = 1
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+
+ self.conv1 = conv(
+ inplanes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm1 = get_norm(self.NORM_TYPE, planes, D, bn_momentum=bn_momentum)
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=1,
+ dilation=dilation,
+ bias=False,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(self.NORM_TYPE, planes, D, bn_momentum=bn_momentum)
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class BasicBlock(BasicBlockBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BasicBlockIN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BasicBlockINBN(BasicBlockBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+
+
+class BottleneckBase(nn.Module):
+ expansion = 4
+ NORM_TYPE = NormType.BATCH_NORM
+
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ bn_momentum=0.1,
+ D=3,
+ ):
+ super().__init__()
+ self.conv1 = conv(inplanes, planes, kernel_size=1, D=D)
+ self.norm1 = get_norm(self.NORM_TYPE, planes, D, bn_momentum=bn_momentum)
+
+ self.conv2 = conv(
+ planes,
+ planes,
+ kernel_size=3,
+ stride=stride,
+ dilation=dilation,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.norm2 = get_norm(self.NORM_TYPE, planes, D, bn_momentum=bn_momentum)
+
+ self.conv3 = conv(planes, planes * self.expansion, kernel_size=1, D=D)
+ self.norm3 = get_norm(
+ self.NORM_TYPE, planes * self.expansion, D, bn_momentum=bn_momentum
+ )
+
+ self.relu = MinkowskiReLU(inplace=True)
+ self.downsample = downsample
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.relu(out)
+
+ out = self.conv3(out)
+ out = self.norm3(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class Bottleneck(BottleneckBase):
+ NORM_TYPE = NormType.BATCH_NORM
+
+
+class BottleneckIN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_NORM
+
+
+class BottleneckINBN(BottleneckBase):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
diff --git a/models/Mask3D/mask3d/models/modules/senet_block.py b/models/Mask3D/mask3d/models/modules/senet_block.py
new file mode 100644
index 0000000000000000000000000000000000000000..130082738505c79d5ecddb010595a5a66b9d8509
--- /dev/null
+++ b/models/Mask3D/mask3d/models/modules/senet_block.py
@@ -0,0 +1,138 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+from mix3d.models.modules.common import ConvType, NormType
+from mix3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class SELayer(nn.Module):
+ def __init__(self, channel, reduction=16, D=-1):
+ # Global coords does not require coords_key
+ super().__init__()
+ self.fc = nn.Sequential(
+ ME.MinkowskiLinear(channel, channel // reduction),
+ ME.MinkowskiReLU(inplace=True),
+ ME.MinkowskiLinear(channel // reduction, channel),
+ ME.MinkowskiSigmoid(),
+ )
+ self.pooling = ME.MinkowskiGlobalPooling(dimension=D)
+ self.broadcast_mul = ME.MinkowskiBroadcastMultiplication(dimension=D)
+
+ def forward(self, x):
+ y = self.pooling(x)
+ y = self.fc(y)
+ return self.broadcast_mul(x, y)
+
+
+class SEBasicBlock(BasicBlock):
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ reduction=16,
+ D=-1,
+ ):
+ super().__init__(
+ inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.se = SELayer(planes, reduction=reduction, D=D)
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.se(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class SEBasicBlockSN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_SWITCH_NORM
+
+
+class SEBasicBlockIN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_INSTANCE_NORM
+
+
+class SEBasicBlockLN(SEBasicBlock):
+ NORM_TYPE = NormType.SPARSE_LAYER_NORM
+
+
+class SEBottleneck(Bottleneck):
+ def __init__(
+ self,
+ inplanes,
+ planes,
+ stride=1,
+ dilation=1,
+ downsample=None,
+ conv_type=ConvType.HYPERCUBE,
+ D=3,
+ reduction=16,
+ ):
+ super().__init__(
+ inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=conv_type,
+ D=D,
+ )
+ self.se = SELayer(planes * self.expansion, reduction=reduction, D=D)
+
+ def forward(self, x):
+ residual = x
+
+ out = self.conv1(x)
+ out = self.norm1(out)
+ out = self.relu(out)
+
+ out = self.conv2(out)
+ out = self.norm2(out)
+ out = self.relu(out)
+
+ out = self.conv3(out)
+ out = self.norm3(out)
+ out = self.se(out)
+
+ if self.downsample is not None:
+ residual = self.downsample(x)
+
+ out += residual
+ out = self.relu(out)
+
+ return out
+
+
+class SEBottleneckSN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_SWITCH_NORM
+
+
+class SEBottleneckIN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_INSTANCE_NORM
+
+
+class SEBottleneckLN(SEBottleneck):
+ NORM_TYPE = NormType.SPARSE_LAYER_NORM
diff --git a/models/Mask3D/mask3d/models/position_embedding.py b/models/Mask3D/mask3d/models/position_embedding.py
new file mode 100644
index 0000000000000000000000000000000000000000..70275f1610e1d3f5ec8d11d18d298b7877204b86
--- /dev/null
+++ b/models/Mask3D/mask3d/models/position_embedding.py
@@ -0,0 +1,179 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Various positional encodings for the transformer.
+"""
+import math
+import torch
+from torch import nn
+import numpy as np
+
+# from utils.pc_util import shift_scale_points
+
+
+def shift_scale_points(pred_xyz, src_range, dst_range=None):
+ """
+ pred_xyz: B x N x 3
+ src_range: [[B x 3], [B x 3]] - min and max XYZ coords
+ dst_range: [[B x 3], [B x 3]] - min and max XYZ coords
+ """
+ if dst_range is None:
+ dst_range = [
+ torch.zeros(
+ (src_range[0].shape[0], 3), device=src_range[0].device
+ ),
+ torch.ones((src_range[0].shape[0], 3), device=src_range[0].device),
+ ]
+
+ if pred_xyz.ndim == 4:
+ src_range = [x[:, None] for x in src_range]
+ dst_range = [x[:, None] for x in dst_range]
+
+ assert src_range[0].shape[0] == pred_xyz.shape[0]
+ assert dst_range[0].shape[0] == pred_xyz.shape[0]
+ assert src_range[0].shape[-1] == pred_xyz.shape[-1]
+ assert src_range[0].shape == src_range[1].shape
+ assert dst_range[0].shape == dst_range[1].shape
+ assert src_range[0].shape == dst_range[1].shape
+
+ src_diff = src_range[1][:, None, :] - src_range[0][:, None, :]
+ dst_diff = dst_range[1][:, None, :] - dst_range[0][:, None, :]
+ prop_xyz = (
+ ((pred_xyz - src_range[0][:, None, :]) * dst_diff) / src_diff
+ ) + dst_range[0][:, None, :]
+ return prop_xyz
+
+
+class PositionEmbeddingCoordsSine(nn.Module):
+ def __init__(
+ self,
+ temperature=10000,
+ normalize=False,
+ scale=None,
+ pos_type="fourier",
+ d_pos=None,
+ d_in=3,
+ gauss_scale=1.0,
+ ):
+ super().__init__()
+ self.d_pos = d_pos
+ self.temperature = temperature
+ self.normalize = normalize
+ if scale is not None and normalize is False:
+ raise ValueError("normalize should be True if scale is passed")
+ if scale is None:
+ scale = 2 * math.pi
+ assert pos_type in ["sine", "fourier"]
+ self.pos_type = pos_type
+ self.scale = scale
+ if pos_type == "fourier":
+ assert d_pos is not None
+ assert d_pos % 2 == 0
+ # define a gaussian matrix input_ch -> output_ch
+ B = torch.empty((d_in, d_pos // 2)).normal_()
+ B *= gauss_scale
+ self.register_buffer("gauss_B", B)
+ self.d_pos = d_pos
+
+ def get_sine_embeddings(self, xyz, num_channels, input_range):
+ num_channels = self.d_pos
+ # clone coords so that shift/scale operations do not affect original tensor
+ orig_xyz = xyz
+ xyz = orig_xyz.clone()
+
+ ncoords = xyz.shape[1]
+ if self.normalize:
+ xyz = shift_scale_points(xyz, src_range=input_range)
+
+ ndim = num_channels // xyz.shape[2]
+ if ndim % 2 != 0:
+ ndim -= 1
+ # automatically handle remainder by assiging it to the first dim
+ rems = num_channels - (ndim * xyz.shape[2])
+
+ assert (
+ ndim % 2 == 0
+ ), f"Cannot handle odd sized ndim={ndim} where num_channels={num_channels} and xyz={xyz.shape}"
+
+ final_embeds = []
+ prev_dim = 0
+
+ for d in range(xyz.shape[2]):
+ cdim = ndim
+ if rems > 0:
+ # add remainder in increments of two to maintain even size
+ cdim += 2
+ rems -= 2
+
+ if cdim != prev_dim:
+ dim_t = torch.arange(
+ cdim, dtype=torch.float32, device=xyz.device
+ )
+ dim_t = self.temperature ** (2 * (dim_t // 2) / cdim)
+
+ # create batch x cdim x nccords embedding
+ raw_pos = xyz[:, :, d]
+ if self.scale:
+ raw_pos *= self.scale
+ pos = raw_pos[:, :, None] / dim_t
+ pos = torch.stack(
+ (pos[:, :, 0::2].sin(), pos[:, :, 1::2].cos()), dim=3
+ ).flatten(2)
+ final_embeds.append(pos)
+ prev_dim = cdim
+
+ final_embeds = torch.cat(final_embeds, dim=2).permute(0, 2, 1)
+ return final_embeds
+
+ def get_fourier_embeddings(self, xyz, num_channels=None, input_range=None):
+ # Follows - https://people.eecs.berkeley.edu/~bmild/fourfeat/index.html
+
+ if num_channels is None:
+ num_channels = self.gauss_B.shape[1] * 2
+
+ bsize, npoints = xyz.shape[0], xyz.shape[1]
+ assert num_channels > 0 and num_channels % 2 == 0
+ d_in, max_d_out = self.gauss_B.shape[0], self.gauss_B.shape[1]
+ d_out = num_channels // 2
+ assert d_out <= max_d_out
+ assert d_in == xyz.shape[-1]
+
+ # clone coords so that shift/scale operations do not affect original tensor
+ orig_xyz = xyz
+ xyz = orig_xyz.clone()
+
+ ncoords = xyz.shape[1]
+ if self.normalize:
+ xyz = shift_scale_points(xyz, src_range=input_range)
+
+ xyz *= 2 * np.pi
+ xyz_proj = torch.mm(xyz.view(-1, d_in), self.gauss_B[:, :d_out]).view(
+ bsize, npoints, d_out
+ )
+ final_embeds = [xyz_proj.sin(), xyz_proj.cos()]
+
+ # return batch x d_pos x npoints embedding
+ final_embeds = torch.cat(final_embeds, dim=2).permute(0, 2, 1)
+ return final_embeds
+
+ def forward(self, xyz, num_channels=None, input_range=None):
+ assert isinstance(xyz, torch.Tensor)
+ assert xyz.ndim == 3
+ # xyz is batch x npoints x 3
+ if self.pos_type == "sine":
+ with torch.no_grad():
+ out = self.get_sine_embeddings(xyz, num_channels, input_range)
+ elif self.pos_type == "fourier":
+ with torch.no_grad():
+ out = self.get_fourier_embeddings(
+ xyz, num_channels, input_range
+ )
+ else:
+ raise ValueError(f"Unknown {self.pos_type}")
+
+ return out
+
+ def extra_repr(self):
+ st = f"type={self.pos_type}, scale={self.scale}, normalize={self.normalize}"
+ if hasattr(self, "gauss_B"):
+ st += f", gaussB={self.gauss_B.shape}, gaussBsum={self.gauss_B.sum().item()}"
+ return st
diff --git a/models/Mask3D/mask3d/models/res16unet.py b/models/Mask3D/mask3d/models/res16unet.py
new file mode 100644
index 0000000000000000000000000000000000000000..db771a6f12341b70d9e27e8f61efc2878b5d12c3
--- /dev/null
+++ b/models/Mask3D/mask3d/models/res16unet.py
@@ -0,0 +1,444 @@
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.resnet import ResNetBase, get_norm
+from mask3d.models.modules.common import ConvType, NormType, conv, conv_tr
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class Res16UNetBase(ResNetBase):
+ BLOCK = None
+ PLANES = (32, 64, 128, 256, 256, 256, 256, 256)
+ DILATIONS = (1, 1, 1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2, 2, 2)
+ INIT_DIM = 32
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(
+ self, in_channels, out_channels, config, D=3, out_fpn=False, **kwargs
+ ):
+ super().__init__(in_channels, out_channels, config, D)
+ self.out_fpn = out_fpn
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv0p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn0 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+
+ self.conv1p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p8s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr4p16s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr5p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr6p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[6] + self.PLANES[0] * self.BLOCK.expansion
+ self.block7 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[6],
+ self.LAYERS[6],
+ dilation=dilations[6],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr7p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[7],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr7 = get_norm(
+ self.NORM_TYPE, self.PLANES[7], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[7] + self.INIT_DIM
+ self.block8 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[7],
+ self.LAYERS[7],
+ dilation=dilations[7],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.final = conv(
+ self.PLANES[7],
+ out_channels,
+ kernel_size=1,
+ stride=1,
+ bias=True,
+ D=D,
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+
+ def forward(self, x):
+ feature_maps = []
+
+ out = self.conv0p1s1(x)
+ out = self.bn0(out)
+ out_p1 = self.relu(out)
+
+ out = self.conv1p1s2(out_p1)
+ out = self.bn1(out)
+ out = self.relu(out)
+ out_b1p2 = self.block1(out)
+
+ out = self.conv2p2s2(out_b1p2)
+ out = self.bn2(out)
+ out = self.relu(out)
+ out_b2p4 = self.block2(out)
+
+ out = self.conv3p4s2(out_b2p4)
+ out = self.bn3(out)
+ out = self.relu(out)
+ out_b3p8 = self.block3(out)
+
+ # pixel_dist=16
+ out = self.conv4p8s2(out_b3p8)
+ out = self.bn4(out)
+ out = self.relu(out)
+ out = self.block4(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=8
+ out = self.convtr4p16s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p8)
+ out = self.block5(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=4
+ out = self.convtr5p8s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p4)
+ out = self.block6(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=2
+ out = self.convtr6p4s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p2)
+ out = self.block7(out)
+
+ feature_maps.append(out)
+
+ # pixel_dist=1
+ out = self.convtr7p2s2(out)
+ out = self.bntr7(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_p1)
+ out = self.block8(out)
+
+ feature_maps.append(out)
+
+ if not self.out_fpn:
+ return out
+ else:
+ return out, feature_maps
+
+
+class Res16UNet14(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1, 1, 1, 1, 1)
+
+
+class Res16UNet18(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2, 2, 2, 2, 2)
+
+
+class Res16UNet34(Res16UNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 3, 4, 6, 2, 2, 2, 2)
+
+
+class Res16UNet50(Res16UNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (2, 3, 4, 6, 2, 2, 2, 2)
+
+
+class Res16UNet101(Res16UNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (2, 3, 4, 23, 2, 2, 2, 2)
+
+
+class Res16UNet14A(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class Res16UNet14A2(Res16UNet14A):
+ LAYERS = (1, 1, 1, 1, 2, 2, 2, 2)
+
+
+class Res16UNet14B(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 128, 128, 128, 128)
+
+
+class Res16UNet14B2(Res16UNet14B):
+ LAYERS = (1, 1, 1, 1, 2, 2, 2, 2)
+
+
+class Res16UNet14B3(Res16UNet14B):
+ LAYERS = (2, 2, 2, 2, 1, 1, 1, 1)
+
+
+class Res16UNet14C(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 192, 192, 128, 128)
+
+
+class Res16UNet14D(Res16UNet14):
+ PLANES = (32, 64, 128, 256, 384, 384, 384, 384)
+
+
+class Res16UNet18A(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class Res16UNet18B(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 128, 128)
+
+
+class Res16UNet18D(Res16UNet18):
+ PLANES = (32, 64, 128, 256, 384, 384, 384, 384)
+
+
+class Res16UNet34A(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 64, 64)
+
+
+class Res16UNet34B(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 64, 32)
+
+
+class Res16UNet34C(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 96, 96)
+
+
+class Custom30M(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 128, 64, 64, 32)
+
+
+class Res16UNet34D(Res16UNet34):
+ PLANES = (32, 64, 128, 256, 256, 128, 96, 128)
+
+
+class STRes16UNetBase(Res16UNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STRes16UNet14(STRes16UNetBase, Res16UNet14):
+ pass
+
+
+class STRes16UNet14A(STRes16UNetBase, Res16UNet14A):
+ pass
+
+
+class STRes16UNet18(STRes16UNetBase, Res16UNet18):
+ pass
+
+
+class STRes16UNet34(STRes16UNetBase, Res16UNet34):
+ pass
+
+
+class STRes16UNet50(STRes16UNetBase, Res16UNet50):
+ pass
+
+
+class STRes16UNet101(STRes16UNetBase, Res16UNet101):
+ pass
+
+
+class STRes16UNet18A(STRes16UNet18):
+ PLANES = (32, 64, 128, 256, 128, 128, 96, 96)
+
+
+class STResTesseract16UNetBase(STRes16UNetBase):
+ pass
+ # CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseract16UNet18A(STRes16UNet18A, STResTesseract16UNetBase):
+ pass
diff --git a/models/Mask3D/mask3d/models/resnet.py b/models/Mask3D/mask3d/models/resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6ad622893d191fce0cf9db6edafbc83f684d218
--- /dev/null
+++ b/models/Mask3D/mask3d/models/resnet.py
@@ -0,0 +1,243 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+from mask3d.models.model import Model
+from mask3d.models.modules.common import ConvType, NormType, conv, get_norm, sum_pool
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class ResNetBase(Model):
+ BLOCK = None
+ LAYERS = ()
+ INIT_DIM = 64
+ PLANES = (64, 128, 256, 512)
+ OUT_PIXEL_DIST = 32
+ HAS_LAST_BLOCK = False
+ CONV_TYPE = ConvType.HYPERCUBE
+
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ assert self.BLOCK is not None
+ assert self.OUT_PIXEL_DIST > 0
+
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+ self.network_initialization(in_channels, out_channels, config, D)
+ self.weight_initialization()
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ dilations = config.dilations
+ bn_momentum = config.bn_momentum
+ self.inplanes = self.INIT_DIM
+ self.conv1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ NormType.BATCH_NORM,
+ self.inplanes,
+ D=self.D,
+ bn_momentum=bn_momentum,
+ )
+ self.relu = ME.MinkowskiReLU(inplace=True)
+ self.pool = sum_pool(
+ kernel_size=space_n_time_m(2, 1), stride=space_n_time_m(2, 1), D=D
+ )
+
+ self.layer1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[0], 1),
+ )
+ self.layer2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[1], 1),
+ )
+ self.layer3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[2], 1),
+ )
+ self.layer4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[3], 1),
+ )
+
+ self.final = conv(
+ self.PLANES[3] * self.BLOCK.expansion,
+ out_channels,
+ kernel_size=1,
+ bias=True,
+ D=D,
+ )
+
+ def weight_initialization(self):
+ for m in self.modules():
+ if isinstance(m, ME.MinkowskiBatchNorm):
+ nn.init.constant_(m.bn.weight, 1)
+ nn.init.constant_(m.bn.bias, 0)
+
+ def _make_layer(
+ self,
+ block,
+ planes,
+ blocks,
+ stride=1,
+ dilation=1,
+ norm_type=NormType.BATCH_NORM,
+ bn_momentum=0.1,
+ ):
+ downsample = None
+ if stride != 1 or self.inplanes != planes * block.expansion:
+ downsample = nn.Sequential(
+ conv(
+ self.inplanes,
+ planes * block.expansion,
+ kernel_size=1,
+ stride=stride,
+ bias=False,
+ D=self.D,
+ ),
+ get_norm(
+ norm_type,
+ planes * block.expansion,
+ D=self.D,
+ bn_momentum=bn_momentum,
+ ),
+ )
+ layers = []
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+ self.inplanes = planes * block.expansion
+ for i in range(1, blocks):
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=1,
+ dilation=dilation,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+
+ return nn.Sequential(*layers)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.bn1(x)
+ x = self.relu(x)
+ x = self.pool(x)
+
+ x = self.layer1(x)
+ x = self.layer2(x)
+ x = self.layer3(x)
+ x = self.layer4(x)
+
+ x = self.final(x)
+ return x
+
+
+class ResNet14(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1)
+
+
+class ResNet18(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2)
+
+
+class ResNet34(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet50(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet101(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 23, 3)
+
+
+class STResNetBase(ResNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STResNet14(STResNetBase, ResNet14):
+ pass
+
+
+class STResNet18(STResNetBase, ResNet18):
+ pass
+
+
+class STResNet34(STResNetBase, ResNet34):
+ pass
+
+
+class STResNet50(STResNetBase, ResNet50):
+ pass
+
+
+class STResNet101(STResNetBase, ResNet101):
+ pass
+
+
+class STResTesseractNetBase(STResNetBase):
+ CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseractNet14(STResTesseractNetBase, STResNet14):
+ pass
+
+
+class STResTesseractNet18(STResTesseractNetBase, STResNet18):
+ pass
+
+
+class STResTesseractNet34(STResTesseractNetBase, STResNet34):
+ pass
+
+
+class STResTesseractNet50(STResTesseractNetBase, STResNet50):
+ pass
+
+
+class STResTesseractNet101(STResTesseractNetBase, STResNet101):
+ pass
diff --git a/models/Mask3D/mask3d/models/resnet.py.tmp b/models/Mask3D/mask3d/models/resnet.py.tmp
new file mode 100644
index 0000000000000000000000000000000000000000..5208c1f576bdd81528b305a27dc9302b867d853f
--- /dev/null
+++ b/models/Mask3D/mask3d/models/resnet.py.tmp
@@ -0,0 +1,240 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+
+from models.model import Model
+from models.modules.common import ConvType, NormType, conv, get_norm, sum_pool
+from models.modules.resnet_block import BasicBlock, Bottleneck
+
+
+class ResNetBase(Model):
+ BLOCK = None
+ LAYERS = ()
+ INIT_DIM = 64
+ PLANES = (64, 128, 256, 512)
+ OUT_PIXEL_DIST = 32
+ HAS_LAST_BLOCK = False
+ CONV_TYPE = ConvType.HYPERCUBE
+
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ assert self.BLOCK is not None
+ assert self.OUT_PIXEL_DIST > 0
+
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+ self.network_initialization(in_channels, out_channels, config, D)
+ self.weight_initialization()
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ dilations = config.dilations
+ bn_momentum = config.bn_momentum
+ self.inplanes = self.INIT_DIM
+ self.conv1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ NormType.BATCH_NORM, self.inplanes, D=self.D, bn_momentum=bn_momentum
+ )
+ self.relu = ME.MinkowskiReLU(inplace=True)
+ self.pool = sum_pool(
+ kernel_size=space_n_time_m(2, 1), stride=space_n_time_m(2, 1), D=D
+ )
+
+ self.layer1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[0], 1),
+ )
+ self.layer2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[1], 1),
+ )
+ self.layer3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[2], 1),
+ )
+ self.layer4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ stride=space_n_time_m(2, 1),
+ dilation=space_n_time_m(dilations[3], 1),
+ )
+
+ self.final = conv(
+ self.PLANES[3] * self.BLOCK.expansion,
+ out_channels,
+ kernel_size=1,
+ bias=True,
+ D=D,
+ )
+
+ def weight_initialization(self):
+ for m in self.modules():
+ if isinstance(m, ME.MinkowskiBatchNorm):
+ nn.init.constant_(m.bn.weight, 1)
+ nn.init.constant_(m.bn.bias, 0)
+
+ def _make_layer(
+ self,
+ block,
+ planes,
+ blocks,
+ stride=1,
+ dilation=1,
+ norm_type=NormType.BATCH_NORM,
+ bn_momentum=0.1,
+ ):
+ downsample = None
+ if stride != 1 or self.inplanes != planes * block.expansion:
+ downsample = nn.Sequential(
+ conv(
+ self.inplanes,
+ planes * block.expansion,
+ kernel_size=1,
+ stride=stride,
+ bias=False,
+ D=self.D,
+ ),
+ get_norm(
+ norm_type,
+ planes * block.expansion,
+ D=self.D,
+ bn_momentum=bn_momentum,
+ ),
+ )
+ layers = []
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=stride,
+ dilation=dilation,
+ downsample=downsample,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+ self.inplanes = planes * block.expansion
+ for i in range(1, blocks):
+ layers.append(
+ block(
+ self.inplanes,
+ planes,
+ stride=1,
+ dilation=dilation,
+ conv_type=self.CONV_TYPE,
+ D=self.D,
+ )
+ )
+
+ return nn.Sequential(*layers)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.bn1(x)
+ x = self.relu(x)
+ x = self.pool(x)
+
+ x = self.layer1(x)
+ x = self.layer2(x)
+ x = self.layer3(x)
+ x = self.layer4(x)
+
+ x = self.final(x)
+ return x
+
+
+class ResNet14(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1)
+
+
+class ResNet18(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2)
+
+
+class ResNet34(ResNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet50(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 6, 3)
+
+
+class ResNet101(ResNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 23, 3)
+
+
+class STResNetBase(ResNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STResNet14(STResNetBase, ResNet14):
+ pass
+
+
+class STResNet18(STResNetBase, ResNet18):
+ pass
+
+
+class STResNet34(STResNetBase, ResNet34):
+ pass
+
+
+class STResNet50(STResNetBase, ResNet50):
+ pass
+
+
+class STResNet101(STResNetBase, ResNet101):
+ pass
+
+
+class STResTesseractNetBase(STResNetBase):
+ CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseractNet14(STResTesseractNetBase, STResNet14):
+ pass
+
+
+class STResTesseractNet18(STResTesseractNetBase, STResNet18):
+ pass
+
+
+class STResTesseractNet34(STResTesseractNetBase, STResNet34):
+ pass
+
+
+class STResTesseractNet50(STResTesseractNetBase, STResNet50):
+ pass
+
+
+class STResTesseractNet101(STResTesseractNetBase, STResNet101):
+ pass
diff --git a/models/Mask3D/mask3d/models/resunet.py b/models/Mask3D/mask3d/models/resunet.py
new file mode 100644
index 0000000000000000000000000000000000000000..98a3adc56f09d534256960c080594e5df3a41c7c
--- /dev/null
+++ b/models/Mask3D/mask3d/models/resunet.py
@@ -0,0 +1,617 @@
+import torch.nn as nn
+import MinkowskiEngine as ME
+import MinkowskiEngine.MinkowskiOps as me
+from MinkowskiEngine import MinkowskiReLU
+
+from mask3d.models.resnet import ResNetBase, get_norm
+from mask3d.models.modules.common import ConvType, NormType, conv, conv_tr
+from mask3d.models.modules.resnet_block import BasicBlock, Bottleneck, BasicBlockINBN
+
+
+class MinkUNetBase(ResNetBase):
+ BLOCK = None
+ PLANES = (64, 128, 256, 512, 256, 128, 128)
+ DILATIONS = (1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2)
+ INIT_DIM = 64
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ super().__init__(in_channels, out_channels, config, D)
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv1p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.PLANES[0], D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr4p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr5p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.convtr6p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+ self.relu = MinkowskiReLU(inplace=True)
+
+ self.final = nn.Sequential(
+ conv(
+ self.PLANES[6] + self.PLANES[0] * self.BLOCK.expansion,
+ 512,
+ kernel_size=1,
+ stride=1,
+ dilation=1,
+ bias=False,
+ D=D,
+ ),
+ ME.MinkowskiBatchNorm(512),
+ ME.MinkowskiReLU(),
+ conv(
+ 512,
+ out_channels,
+ kernel_size=1,
+ stride=1,
+ dilation=1,
+ bias=True,
+ D=D,
+ ),
+ )
+
+ def forward(self, x):
+ out = self.conv1p1s1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+
+ out_b1p1 = self.block1(out)
+
+ out = self.conv2p1s2(out_b1p1)
+ out = self.bn2(out)
+ out = self.relu(out)
+
+ out_b2p2 = self.block2(out)
+
+ out = self.conv3p2s2(out_b2p2)
+ out = self.bn3(out)
+ out = self.relu(out)
+
+ out_b3p4 = self.block3(out)
+
+ out = self.conv4p4s2(out_b3p4)
+ out = self.bn4(out)
+ out = self.relu(out)
+
+ # pixel_dist=8
+ out = self.block4(out)
+
+ out = self.convtr4p8s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p4)
+ out = self.block5(out)
+
+ out = self.convtr5p4s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p2)
+ out = self.block6(out)
+
+ out = self.convtr6p2s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p1)
+ return self.final(out)
+
+
+class ResUNet14(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (1, 1, 1, 1, 1, 1)
+
+
+class ResUNet18(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (2, 2, 2, 2, 2, 2)
+
+
+class ResUNet18INBN(ResUNet18):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+ BLOCK = BasicBlockINBN
+
+
+class ResUNet34(MinkUNetBase):
+ BLOCK = BasicBlock
+ LAYERS = (3, 4, 6, 3, 2, 2)
+
+
+class ResUNet50(MinkUNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 6, 3, 2, 2)
+
+
+class ResUNet101(MinkUNetBase):
+ BLOCK = Bottleneck
+ LAYERS = (3, 4, 23, 3, 2, 2)
+
+
+class ResUNet14D(ResUNet14):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet18D(ResUNet18):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet34D(ResUNet34):
+ PLANES = (64, 128, 256, 512, 512, 512, 512)
+
+
+class ResUNet34E(ResUNet34):
+ INIT_DIM = 32
+ PLANES = (32, 64, 128, 256, 128, 64, 64)
+
+
+class ResUNet34F(ResUNet34):
+ INIT_DIM = 32
+ PLANES = (32, 64, 128, 256, 128, 64, 32)
+
+
+class MinkUNetHyper(MinkUNetBase):
+ BLOCK = None
+ PLANES = (64, 128, 256, 512, 256, 128, 128)
+ DILATIONS = (1, 1, 1, 1, 1, 1)
+ LAYERS = (2, 2, 2, 2, 2, 2)
+ INIT_DIM = 64
+ OUT_PIXEL_DIST = 1
+ NORM_TYPE = NormType.BATCH_NORM
+ NON_BLOCK_CONV_TYPE = ConvType.SPATIAL_HYPERCUBE
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ # To use the model, must call initialize_coords before forward pass.
+ # Once data is processed, call clear to reset the model before calling initialize_coords
+ def __init__(self, in_channels, out_channels, config, D=3, **kwargs):
+ super(MinkUNetBase, self).__init__(
+ in_channels, out_channels, config, D
+ )
+
+ def network_initialization(self, in_channels, out_channels, config, D):
+ # Setup net_metadata
+ dilations = self.DILATIONS
+ bn_momentum = config.bn_momentum
+
+ def space_n_time_m(n, m):
+ return n if D == 3 else [n, n, n, m]
+
+ if D == 4:
+ self.OUT_PIXEL_DIST = space_n_time_m(self.OUT_PIXEL_DIST, 1)
+
+ # Output of the first conv concated to conv6
+ self.inplanes = self.INIT_DIM
+ self.conv1p1s1 = conv(
+ in_channels,
+ self.inplanes,
+ kernel_size=space_n_time_m(config.conv1_kernel_size, 1),
+ stride=1,
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+
+ self.bn1 = get_norm(
+ self.NORM_TYPE, self.PLANES[0], D, bn_momentum=bn_momentum
+ )
+ self.block1 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[0],
+ self.LAYERS[0],
+ dilation=dilations[0],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv2p1s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn2 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block2 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[1],
+ self.LAYERS[1],
+ dilation=dilations[1],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv3p2s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn3 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block3 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[2],
+ self.LAYERS[2],
+ dilation=dilations[2],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+
+ self.conv4p4s2 = conv(
+ self.inplanes,
+ self.inplanes,
+ kernel_size=space_n_time_m(2, 1),
+ stride=space_n_time_m(2, 1),
+ dilation=1,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bn4 = get_norm(
+ self.NORM_TYPE, self.inplanes, D, bn_momentum=bn_momentum
+ )
+ self.block4 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[3],
+ self.LAYERS[3],
+ dilation=dilations[3],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr4 = ME.MinkowskiPoolingTranspose(
+ kernel_size=8, stride=8, dimension=D
+ )
+ _ = self.inplanes
+ self.convtr4p8s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[4],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr4 = get_norm(
+ self.NORM_TYPE, self.PLANES[4], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[4] + self.PLANES[2] * self.BLOCK.expansion
+ self.block5 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[4],
+ self.LAYERS[4],
+ dilation=dilations[4],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr5 = ME.MinkowskiPoolingTranspose(
+ kernel_size=4, stride=4, dimension=D
+ )
+ out_pool5 = self.inplanes
+ self.convtr5p4s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[5],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr5 = get_norm(
+ self.NORM_TYPE, self.PLANES[5], D, bn_momentum=bn_momentum
+ )
+
+ self.inplanes = self.PLANES[5] + self.PLANES[1] * self.BLOCK.expansion
+ self.block6 = self._make_layer(
+ self.BLOCK,
+ self.PLANES[5],
+ self.LAYERS[5],
+ dilation=dilations[5],
+ norm_type=self.NORM_TYPE,
+ bn_momentum=bn_momentum,
+ )
+ self.pool_tr6 = ME.MinkowskiPoolingTranspose(
+ kernel_size=2, stride=2, dimension=D
+ )
+ out_pool6 = self.inplanes
+ self.convtr6p2s2 = conv_tr(
+ self.inplanes,
+ self.PLANES[6],
+ kernel_size=space_n_time_m(2, 1),
+ upsample_stride=space_n_time_m(2, 1),
+ dilation=1,
+ bias=False,
+ conv_type=self.NON_BLOCK_CONV_TYPE,
+ D=D,
+ )
+ self.bntr6 = get_norm(
+ self.NORM_TYPE, self.PLANES[6], D, bn_momentum=bn_momentum
+ )
+
+ self.relu = MinkowskiReLU(inplace=True)
+
+ self.final = nn.Sequential(
+ conv(
+ out_pool5
+ + out_pool6
+ + self.PLANES[6]
+ + self.PLANES[0] * self.BLOCK.expansion,
+ 512,
+ kernel_size=1,
+ bias=False,
+ D=D,
+ ),
+ ME.MinkowskiBatchNorm(512),
+ ME.MinkowskiReLU(),
+ conv(512, out_channels, kernel_size=1, bias=True, D=D),
+ )
+
+ def forward(self, x):
+ out = self.conv1p1s1(x)
+ out = self.bn1(out)
+ out = self.relu(out)
+
+ out_b1p1 = self.block1(out)
+
+ out = self.conv2p1s2(out_b1p1)
+ out = self.bn2(out)
+ out = self.relu(out)
+
+ out_b2p2 = self.block2(out)
+
+ out = self.conv3p2s2(out_b2p2)
+ out = self.bn3(out)
+ out = self.relu(out)
+
+ out_b3p4 = self.block3(out)
+
+ out = self.conv4p4s2(out_b3p4)
+ out = self.bn4(out)
+ out = self.relu(out)
+
+ # pixel_dist=8
+ out = self.block4(out)
+
+ out = self.convtr4p8s2(out)
+ out = self.bntr4(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b3p4)
+ out = self.block5(out)
+ out_5 = self.pool_tr5(out)
+
+ out = self.convtr5p4s2(out)
+ out = self.bntr5(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b2p2)
+ out = self.block6(out)
+ out_6 = self.pool_tr6(out)
+
+ out = self.convtr6p2s2(out)
+ out = self.bntr6(out)
+ out = self.relu(out)
+
+ out = me.cat(out, out_b1p1, out_6, out_5)
+ return self.final(out)
+
+
+class MinkUNetHyper14INBN(MinkUNetHyper):
+ NORM_TYPE = NormType.INSTANCE_BATCH_NORM
+ BLOCK = BasicBlockINBN
+
+
+class STMinkUNetBase(MinkUNetBase):
+
+ CONV_TYPE = ConvType.SPATIAL_HYPERCUBE_TEMPORAL_HYPERCROSS
+
+ def __init__(self, in_channels, out_channels, config, D=4, **kwargs):
+ super().__init__(in_channels, out_channels, config, D, **kwargs)
+
+
+class STResUNet14(STMinkUNetBase, ResUNet14):
+ pass
+
+
+class STResUNet18(STMinkUNetBase, ResUNet18):
+ pass
+
+
+class STResUNet34(STMinkUNetBase, ResUNet34):
+ pass
+
+
+class STResUNet50(STMinkUNetBase, ResUNet50):
+ pass
+
+
+class STResUNet101(STMinkUNetBase, ResUNet101):
+ pass
+
+
+class STResTesseractUNetBase(STMinkUNetBase):
+ CONV_TYPE = ConvType.HYPERCUBE
+
+
+class STResTesseractUNet14(STResTesseractUNetBase, ResUNet14):
+ pass
+
+
+class STResTesseractUNet18(STResTesseractUNetBase, ResUNet18):
+ pass
+
+
+class STResTesseractUNet34(STResTesseractUNetBase, ResUNet34):
+ pass
+
+
+class STResTesseractUNet50(STResTesseractUNetBase, ResUNet50):
+ pass
+
+
+class STResTesseractUNet101(STResTesseractUNetBase, ResUNet101):
+ pass
diff --git a/models/Mask3D/mask3d/models/wrapper.py b/models/Mask3D/mask3d/models/wrapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..a6bf1678d2106049b8e6a2ac2f3a9aff37dcfc9c
--- /dev/null
+++ b/models/Mask3D/mask3d/models/wrapper.py
@@ -0,0 +1,32 @@
+import random
+
+from torch.nn import Module
+from MinkowskiEngine import SparseTensor
+
+
+class Wrapper(Module):
+ """
+ Wrapper for the segmentation networks.
+ """
+
+ OUT_PIXEL_DIST = -1
+
+ def __init__(self, NetClass, in_nchannel, out_nchannel, config):
+ super().__init__()
+ self.initialize_filter(NetClass, in_nchannel, out_nchannel, config)
+
+ def initialize_filter(self, NetClass, in_nchannel, out_nchannel, config):
+ raise NotImplementedError("Must initialize a model and a filter")
+
+ def forward(self, x, coords, colors=None):
+ soutput = self.model(x)
+
+ # During training, make the network invariant to the filter
+ if not self.training or random.random() < 0.5:
+ # Filter requires the model to finish the forward pass
+ wrapper_coords = self.filter.initialize_coords(
+ self.model, coords, colors
+ )
+ finput = SparseTensor(soutput.F, wrapper_coords)
+ soutput = self.filter(finput)
+ return soutput
diff --git a/models/Mask3D/mask3d/predict.py b/models/Mask3D/mask3d/predict.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c085fd01897c13540da8eac9f941dcf0847ca6f
--- /dev/null
+++ b/models/Mask3D/mask3d/predict.py
@@ -0,0 +1,187 @@
+import hydra
+from omegaconf import DictConfig, OmegaConf
+from models.mask3d import Mask3D
+import os
+import torch
+
+import MinkowskiEngine as ME
+import open3d as o3d
+import numpy as np
+import albumentations as A
+
+from utils.utils import (
+ flatten_dict,
+ load_baseline_model,
+ load_checkpoint_with_missing_or_exsessive_keys,
+ load_backbone_checkpoint_with_missing_or_exsessive_keys,
+)
+
+from datasets.scannet200.scannet200_constants import (
+ SCANNET_COLOR_MAP_200,
+ SCANNET_COLOR_MAP_20,
+ VALID_CLASS_IDS_200,
+ VALID_CLASS_IDS_20,
+ CLASS_LABELS_200,
+ CLASS_LABELS_20,
+)
+
+root_dir = '/home/weders/scratch/scratch/scannetter/arkit/raw/Validation'
+
+class InstanceSegmentation(torch.nn.Module):
+ def __init__(self, cfg):
+ super().__init__()
+ self.model = hydra.utils.instantiate(cfg.model)
+
+
+ def forward(self, x, raw_coordinates=None):
+ return self.model(x, raw_coordinates=raw_coordinates)
+
+@hydra.main(
+ config_path="conf", config_name="config_base_instance_segmentation.yaml"
+)
+def main(cfg: DictConfig):
+
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ os.chdir(hydra.utils.get_original_cwd())
+ model = InstanceSegmentation(cfg)
+
+ if cfg.general.backbone_checkpoint is not None:
+ cfg, model = load_backbone_checkpoint_with_missing_or_exsessive_keys(
+ cfg, model
+ )
+ if cfg.general.checkpoint is not None:
+ cfg, model = load_checkpoint_with_missing_or_exsessive_keys(cfg, model)
+
+ model = model.to(device)
+ # model.eval()
+
+ color_mean = (0.47793125906962, 0.4303257521323044, 0.3749598901421883)
+ color_std = (0.2834475483823543, 0.27566157565723015, 0.27018971370874995)
+ normalize_color = A.Normalize(mean=color_mean, std=color_std)
+
+ # iterate over data
+ for sc in os.listdir(root_dir):
+
+
+ if not os.path.exists(os.path.join(root_dir, sc, 'mesh_tsdf.ply')):
+ continue
+
+ # save outputs
+ output_dir = os.path.join(root_dir, sc, 'pred_mask3d_ours')
+ if not os.path.exists(output_dir):
+ os.makedirs(output_dir)
+
+ if sc != '42445991':
+ continue
+
+ # if os.path.exists(os.path.join(output_dir, 'mask3d_predictions.txt')):
+ # print('Skipping', sc)
+ # continue
+
+ print('Processing', sc)
+
+ mesh = o3d.io.read_triangle_mesh(os.path.join(root_dir, sc, 'mesh_tsdf.ply'))
+ mesh.compute_vertex_normals()
+
+ points = np.asarray(mesh.vertices)
+ colors = np.asarray(mesh.vertex_colors)
+
+
+ colors = colors * 255.
+ pseudo_image = colors.astype(np.uint8)[np.newaxis, :, :]
+ colors = np.squeeze(normalize_color(image=pseudo_image)["image"])
+
+ # voxelize data
+ coords = np.floor(points / 0.02)
+
+ # maybe this change (_, _, ...) is not necessary and we can directly get out
+ # the sample coordinates?
+ _, _, unique_map, inverse_map = ME.utils.sparse_quantize(coordinates=coords, features=colors, return_index=True, return_inverse=True)
+
+ sample_coordinates = coords[unique_map]
+ coordinates = [torch.from_numpy(sample_coordinates).int()]
+ sample_features = colors[unique_map]
+ features = [torch.from_numpy(sample_features).float()]
+
+ coordinates, _ = ME.utils.sparse_collate(coords=coordinates, feats=features)
+ features = torch.cat(features, dim=0)
+ data = ME.SparseTensor(
+ coordinates=coordinates,
+ features=features,
+ device=device,
+ )
+
+ # run model
+ with torch.no_grad():
+ outputs = model(data, raw_coordinates=features)
+
+ del data
+ torch.cuda.empty_cache()
+
+ # parse predictions
+ logits = outputs["pred_logits"]
+ masks = outputs["pred_masks"]
+
+
+ # reformat predictions
+ logits = logits[0].detach().cpu()
+ masks = masks[0].detach().cpu()
+
+ labels = []
+ confidences = []
+ masks_binary = []
+
+ for i in range(len(logits)):
+ p_labels = torch.softmax(logits[i], dim=-1)
+ p_masks = torch.sigmoid(masks[:, i])
+ l = torch.argmax(p_labels, dim=-1)
+ c_label = torch.max(p_labels)
+ m = p_masks > 0.5
+ c_m = p_masks[m].sum() / (m.sum() + 1e-8)
+ c = c_label * c_m
+ if l < 200 and c > 0.5:
+ labels.append(l.item())
+ confidences.append(c.item())
+ masks_binary.append(m[inverse_map]) # mapping the mask back to the original point cloud
+
+
+ # save labelled mesh
+ mesh_labelled = o3d.geometry.TriangleMesh()
+ mesh_labelled.vertices = mesh.vertices
+ mesh_labelled.triangles = mesh.triangles
+
+ labels_mapped = np.zeros((len(mesh.vertices), 1))
+ colors_mapped = np.zeros((len(mesh.vertices), 3))
+
+ confidences, labels, masks_binary = zip(*sorted(zip(confidences, labels, masks_binary), reverse=False))
+ for i, (l, c, m) in enumerate(zip(labels, confidences, masks_binary)):
+ labels_mapped[m == 1] = l
+ if l == 0:
+ l_ = -1 + 2 # label offset is 2 for scannet 200, 0 needs to be mapped to -1 before (see trainer.py in Mask3D)
+ else:
+ l_ = l + 2
+ # print(VALID_CLASS_IDS_200[l_], SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l_]], l_, CLASS_LABELS_200[l_])
+ colors_mapped[m == 1] = SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l_]]
+
+ # colors_mapped[mask_mapped == 1] = SCANNET_COLOR_MAP_200[VALID_CLASS_IDS_200[l]]
+
+
+
+
+ mesh_labelled.vertex_colors = o3d.utility.Vector3dVector(colors_mapped.astype(np.float32) / 255.)
+ o3d.io.write_triangle_mesh(f'{output_dir}/mesh_tsdf_labelled.ply', mesh_labelled)
+
+ mask_path = os.path.join(output_dir, 'pred_mask')
+ if not os.path.exists(mask_path):
+ os.makedirs(mask_path)
+
+ # sorting by confidence
+ with open(os.path.join(output_dir, 'mask3d_predictions.txt'), 'w') as f:
+ for i, (l, c, m) in enumerate(zip(labels, confidences, masks_binary)):
+ mask_file = f'pred_mask/{str(i).zfill(3)}.txt'
+ f.write(f'{mask_file} {VALID_CLASS_IDS_200[l]} {c}\n')
+ np.savetxt(os.path.join(output_dir, mask_file), m.numpy(), fmt='%d')
+
+
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/preprocess_arkitscenes.py b/models/Mask3D/mask3d/preprocess_arkitscenes.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/scripts/arkitscenes/test.sh b/models/Mask3D/mask3d/scripts/arkitscenes/test.sh
new file mode 100644
index 0000000000000000000000000000000000000000..64cee20547d22a6502ade31c199f342121c59c4b
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/arkitscenes/test.sh
@@ -0,0 +1,23 @@
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_DBSCAN=0.95
+CURR_TOPK=750
+CURR_QUERY=150
+
+python predict.py \
+general.experiment_name="arkitscenes" \
+general.project_name="arktiscenes" \
+general.checkpoint="checkpoints/scannet200/scannet200_benchmark.ckpt" \
+data/datasets=scannet200 \
+general.num_targets=201 \
+data.num_labels=200 \
+general.eval_on_segments=false \
+general.train_on_segments=false \
+general.train_mode=false \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN} \
+general.export=true \
+data.test_mode=test \
+general.export_threshold=${CURR_T}
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/scripts/s3dis/s3dis_from_scratch.sh b/models/Mask3D/mask3d/scripts/s3dis/s3dis_from_scratch.sh
new file mode 100644
index 0000000000000000000000000000000000000000..373e067d050bd30a904fa955d3ea26f9414c0f2a
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/s3dis/s3dis_from_scratch.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_AREA=1 # set the area number accordingly [1,6]
+CURR_DBSCAN=0.6
+CURR_TOPK=-1
+CURR_QUERY=100
+
+python main_instance_segmentation.py \
+ general.project_name="s3dis" \
+ general.experiment_name="area${CURR_AREA}_from_scratch" \
+ data.batch_size=4 \
+ data/datasets=s3dis \
+ general.num_targets=14 \
+ data.num_labels=13 \
+ trainer.max_epochs=1001 \
+ general.area=${CURR_AREA} \
+ trainer.check_val_every_n_epoch=10
+
+python main_instance_segmentation.py \
+general.project_name="s3dis_eval" \
+general.experiment_name="area${CURR_AREA}_from_scratch_eps_${CURR_DBSCAN}_topk_${CURR_TOPK}_q_${CURR_QUERY}" \
+general.checkpoint="checkpoints/s3dis/from_scratch/area${CURR_AREA}.ckpt" \
+general.train_mode=false \
+data.batch_size=4 \
+data/datasets=s3dis \
+general.num_targets=14 \
+data.num_labels=13 \
+general.area=${CURR_AREA} \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/scripts/s3dis/s3dis_pretrained.sh b/models/Mask3D/mask3d/scripts/s3dis/s3dis_pretrained.sh
new file mode 100644
index 0000000000000000000000000000000000000000..f5a1d08d8a4a17f9d6aa2f88c5043d23bd9b1fed
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/s3dis/s3dis_pretrained.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_AREA=1 # set the area number accordingly [1,6]
+CURR_DBSCAN=0.6
+CURR_TOPK=-1
+CURR_QUERY=100
+
+python main_instance_segmentation.py \
+ general.project_name="s3dis" \
+ general.experiment_name="area${CURR_AREA}_pretrained" \
+ data.batch_size=4 \
+ data/datasets=s3dis \
+ general.num_targets=14 \
+ data.num_labels=13 \
+ general.area=${CURR_AREA} \
+ general.checkpoint="checkpoints/s3dis/scannet_pretrained/scannet_pretrained.ckpt" \
+ trainer.check_val_every_n_epoch=10 \
+ optimizer.lr=0.00001
+
+python main_instance_segmentation.py \
+general.project_name="s3dis_eval" \
+general.experiment_name="area${CURR_AREA}_pretrained_eps_${CURR_DBSCAN}_topk_${CURR_TOPK}_q_${CURR_QUERY}" \
+general.checkpoint="checkpoints/s3dis/scannet_pretrained/area${CURR_AREA}.ckpt" \
+general.train_mode=false \
+data.batch_size=4 \
+data/datasets=s3dis \
+general.num_targets=14 \
+data.num_labels=13 \
+general.area=${CURR_AREA} \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/scripts/scannet/scannet_benchmark.sh b/models/Mask3D/mask3d/scripts/scannet/scannet_benchmark.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d8a45ba9717a5488b3a387dc2f29028de6c1c5ae
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/scannet/scannet_benchmark.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_DBSCAN=0.95
+CURR_TOPK=300
+CURR_QUERY=150
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="benchmark" \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+data.train_mode=train_validation
+
+# TEST
+python main_instance_segmentation.py \
+general.experiment_name="benchmark_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}" \
+general.project_name="scannet_eval" \
+general.checkpoint='checkpoints/scannet/scannet_benchmark.ckpt' \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+general.train_mode=false \
+general.export=true \
+data.test_mode=test \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/scripts/scannet/scannet_pretrain_for_s3dis.sh b/models/Mask3D/mask3d/scripts/scannet/scannet_pretrain_for_s3dis.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cfb1c1312257a7a4415c528d4935f160796e4ecf
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/scannet/scannet_pretrain_for_s3dis.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="pretrain_for_s3dis" \
+data.train_mode=train_validation
\ No newline at end of file
diff --git a/models/Mask3D/mask3d/scripts/scannet/scannet_val.sh b/models/Mask3D/mask3d/scripts/scannet/scannet_val.sh
new file mode 100644
index 0000000000000000000000000000000000000000..8c82a26204f145f6eb20bd9fa2a1f632cdaea77d
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/scannet/scannet_val.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_DBSCAN=0.95
+CURR_TOPK=500
+CURR_QUERY=150
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="validation" \
+general.eval_on_segments=true \
+general.train_on_segments=true
+
+# TEST
+python main_instance_segmentation.py \
+general.experiment_name="validation_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}" \
+general.project_name="scannet_eval" \
+general.checkpoint='checkpoints/scannet/scannet_val.ckpt' \
+general.train_mode=false \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/scripts/scannet200/scannet200_benchmark.sh b/models/Mask3D/mask3d/scripts/scannet200/scannet200_benchmark.sh
new file mode 100644
index 0000000000000000000000000000000000000000..7177d4a6742d485f63e5b878aeb292babf3364d5
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/scannet200/scannet200_benchmark.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_DBSCAN=0.95
+CURR_TOPK=300
+CURR_QUERY=150
+CURR_T=0.001
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="scannet200_benchmark" \
+general.project_name="scannet200" \
+data/datasets=scannet200 \
+general.num_targets=201 \
+data.num_labels=200 \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+data.train_mode=train_validation
+
+# TEST
+python main_instance_segmentation.py \
+general.experiment_name="scannet200_benchmark_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}_export_${CURR_T}" \
+general.project_name="scannet200_eval" \
+general.checkpoint="checkpoints/scannet200/scannet200_benchmark.ckpt" \
+data/datasets=scannet200 \
+general.num_targets=201 \
+data.num_labels=200 \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+general.train_mode=false \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN} \
+general.export=true \
+data.test_mode=test \
+general.export_threshold=${CURR_T}
diff --git a/models/Mask3D/mask3d/scripts/scannet200/scannet200_val.sh b/models/Mask3D/mask3d/scripts/scannet200/scannet200_val.sh
new file mode 100644
index 0000000000000000000000000000000000000000..80f030f575c6080e1f74316a6f126e66702e5b59
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/scannet200/scannet200_val.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3 # speeds up MinkowskiEngine
+
+CURR_DBSCAN=0.95
+CURR_TOPK=750
+CURR_QUERY=150
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="scannet200_val" \
+general.project_name="scannet200" \
+data/datasets=scannet200 \
+general.num_targets=201 \
+data.num_labels=200 \
+general.eval_on_segments=true \
+general.train_on_segments=true
+
+# TEST
+python main_instance_segmentation.py \
+general.experiment_name="scannet200_val_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}" \
+general.project_name="scannet200_eval" \
+general.checkpoint="checkpoints/scannet200/scannet200_val.ckpt" \
+data/datasets=scannet200 \
+general.num_targets=201 \
+data.num_labels=200 \
+general.eval_on_segments=true \
+general.train_on_segments=true \
+general.train_mode=false \
+model.num_queries=${CURR_QUERY} \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/scripts/stpls3d/merge_exports.py b/models/Mask3D/mask3d/scripts/stpls3d/merge_exports.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a314a3b563d0f19cf1f0c6e0ce522d4df9c5bea
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/stpls3d/merge_exports.py
@@ -0,0 +1,55 @@
+import os
+import shutil
+from glob import glob
+from tqdm import tqdm
+
+base_path = "INSERT_WORKING_DIRECTORY"
+vs03 = f"{base_path}/benchmark_03"
+vs02 = f"{base_path}/benchmark_02"
+
+target_path = "INSERT_TARGET_DIRECTORY"
+
+print("COPY MASKS FILES 1/2 ...")
+shutil.copytree(f"{vs02}/pred_mask", f"{target_path}/pred_mask_02")
+print("COPY MASKS FILES 2/2 ...")
+shutil.copytree(f"{vs03}/pred_mask", f"{target_path}/pred_mask_03")
+
+for scene03 in tqdm(glob(f"{vs03}/*.txt")):
+ instances = []
+ with open(scene03, "r") as file03:
+ while line := file03.readline().rstrip():
+ mask_path, class_id, score = line.split(" ")
+
+ if int(class_id) in [1, 3, 4, 7, 8, 11, 12, 13]:
+ instances.append(
+ f'{mask_path.replace("pred_mask", "pred_mask_03")} {class_id} {score}'
+ )
+ print(instances[-1])
+ else:
+ print(
+ f'DELETE {target_path}/{mask_path.replace("pred_mask", "pred_mask_03")}'
+ )
+ os.remove(
+ f'{target_path}/{mask_path.replace("pred_mask", "pred_mask_03")}'
+ )
+
+ with open(f'{vs02}/{scene03.split("/")[-1]}', "r") as file02:
+ while line := file02.readline().rstrip():
+ mask_path, class_id, score = line.split(" ")
+
+ if int(class_id) not in [1, 3, 4, 7, 8, 11, 12, 13]:
+ instances.append(
+ f'{mask_path.replace("pred_mask", "pred_mask_02")} {class_id} {score}'
+ )
+ print(instances[-1])
+ else:
+ print(
+ f'DELETE {target_path}/{mask_path.replace("pred_mask", "pred_mask_02")}'
+ )
+ os.remove(
+ f'{target_path}/{mask_path.replace("pred_mask", "pred_mask_02")}'
+ )
+
+ with open(f'{target_path}/{scene03.split("/")[-1]}', "w") as fout:
+ for line in instances:
+ fout.write(f"{line}\n")
diff --git a/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_benchmark.sh b/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_benchmark.sh
new file mode 100644
index 0000000000000000000000000000000000000000..72443361774e05dc7a85c72754643a934b5891be
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_benchmark.sh
@@ -0,0 +1,99 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3
+
+CURR_DBSCAN=12.5
+CURR_TOPK=200
+CURR_QUERY=160
+CURR_SIZE=54
+CURR_THRESHOLD=0.01
+
+# TRAIN network 1 with voxel size 0.333
+python main_instance_segmentation.py \
+general.experiment_name="benchmark_03" \
+general.project_name="stpls3d" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.333 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+model.config.backbone._target_=models.Res16UNet18B \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0 \
+data.train_mode=train_validation
+
+# TRAIN network 2 with voxel size 0.2 and larger backbone
+python main_instance_segmentation.py \
+general.experiment_name="benchmark_02" \
+general.project_name="stpls3d" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.2 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0 \
+data.train_mode=train_validation
+
+# TEST network 1
+python main_instance_segmentation.py \
+general.experiment_name="benchmark_03_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}_size_${CURR_SIZE}_T_${CURR_THRESHOLD}" \
+general.project_name="stpls3d_eval" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.333 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+model.config.backbone._target_=models.Res16UNet18B \
+general.train_mode=false \
+general.checkpoint="checkpoints/stpls3d/stpls3d_benchmark_03.ckpt" \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0 \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN} \
+data.test_mode=test \
+general.export=true
+
+# TEST network 2
+python main_instance_segmentation.py \
+general.experiment_name="benchmark_02_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}_size_${CURR_SIZE}_T_${CURR_THRESHOLD}" \
+general.project_name="stpls3d_eval" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.2 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+general.train_mode=false \
+general.checkpoint="checkpoints/stpls3d/stpls3d_benchmark_02.ckpt" \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0 \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN} \
+data.test_mode=test \
+general.export=true
+
+# COMBINE OUTPUTS OF ENSEMBLE
+# VOXEL SIZE 0.2 FOR OBJECTS OF SMALL CLASSES; VOXEL SIZE 0.333 FOR OBJECTS OF LARGE CLASS CATEGORIES
+# TODO FILL IN PATHS
+python merge_exports.py
diff --git a/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_val.sh b/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_val.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4d5cdce1e34537c2d1d3940edb37f7693d55aba1
--- /dev/null
+++ b/models/Mask3D/mask3d/scripts/stpls3d/stpls3d_val.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+export OMP_NUM_THREADS=3
+
+CURR_DBSCAN=14.0
+CURR_TOPK=750
+CURR_QUERY=160
+CURR_SIZE=54
+
+# TRAIN
+python main_instance_segmentation.py \
+general.experiment_name="validation" \
+general.project_name="stpls3d" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.333 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+model.config.backbone._target_=models.Res16UNet18B \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0
+
+# TEST
+python main_instance_segmentation.py \
+general.experiment_name="validation_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}_size_${CURR_SIZE}" \
+general.project_name="stpls3d_eval" \
+data/datasets=stpls3d \
+general.num_targets=15 \
+data.num_labels=15 \
+data.voxel_size=0.333 \
+data.num_workers=10 \
+data.cache_data=true \
+data.cropping_v1=false \
+general.reps_per_epoch=100 \
+model.num_queries=${CURR_QUERY} \
+general.on_crops=true \
+model.config.backbone._target_=models.Res16UNet18B \
+general.train_mode=false \
+general.checkpoint="checkpoints/stpls3d/stpls3d_val.ckpt" \
+data.crop_length=${CURR_SIZE} \
+general.eval_inner_core=50.0 \
+general.topk_per_image=${CURR_TOPK} \
+general.use_dbscan=true \
+general.dbscan_eps=${CURR_DBSCAN}
diff --git a/models/Mask3D/mask3d/trainer/__init__.py b/models/Mask3D/mask3d/trainer/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/trainer/trainer.py b/models/Mask3D/mask3d/trainer/trainer.py
new file mode 100644
index 0000000000000000000000000000000000000000..b794e38aa5b2cef7eb106f95ced43466768b3dba
--- /dev/null
+++ b/models/Mask3D/mask3d/trainer/trainer.py
@@ -0,0 +1,1302 @@
+import gc
+from contextlib import nullcontext
+from pathlib import Path
+import statistics
+import shutil
+import os
+import math
+import pyviz3d.visualizer as vis
+from torch_scatter import scatter_mean
+import matplotlib
+from benchmark.evaluate_semantic_instance import evaluate
+from collections import defaultdict
+from sklearn.cluster import DBSCAN
+from utils.votenet_utils.eval_det import eval_det
+from datasets.scannet200.scannet200_splits import (
+ HEAD_CATS_SCANNET_200,
+ TAIL_CATS_SCANNET_200,
+ COMMON_CATS_SCANNET_200,
+ VALID_CLASS_IDS_200_VALIDATION,
+)
+
+import hydra
+import MinkowskiEngine as ME
+import numpy as np
+import pytorch_lightning as pl
+import torch
+from models.metrics import IoU
+import random
+import colorsys
+from typing import List, Tuple
+import functools
+
+
+@functools.lru_cache(20)
+def get_evenly_distributed_colors(
+ count: int,
+) -> List[Tuple[np.uint8, np.uint8, np.uint8]]:
+ # lru cache caches color tuples
+ HSV_tuples = [(x / count, 1.0, 1.0) for x in range(count)]
+ random.shuffle(HSV_tuples)
+ return list(
+ map(
+ lambda x: (np.array(colorsys.hsv_to_rgb(*x)) * 255).astype(
+ np.uint8
+ ),
+ HSV_tuples,
+ )
+ )
+
+
+class RegularCheckpointing(pl.Callback):
+ def on_train_epoch_end(
+ self, trainer: "pl.Trainer", pl_module: "pl.LightningModule"
+ ):
+ general = pl_module.config.general
+ trainer.save_checkpoint(f"{general.save_dir}/last-epoch.ckpt")
+ print("Checkpoint created")
+
+
+class InstanceSegmentation(pl.LightningModule):
+ def __init__(self, config):
+ super().__init__()
+
+ self.decoder_id = config.general.decoder_id
+
+ if config.model.train_on_segments:
+ self.mask_type = "segment_mask"
+ else:
+ self.mask_type = "masks"
+
+ self.eval_on_segments = config.general.eval_on_segments
+
+ self.config = config
+ self.save_hyperparameters()
+ # model
+ self.model = hydra.utils.instantiate(config.model)
+ self.optional_freeze = nullcontext
+ if config.general.freeze_backbone:
+ self.optional_freeze = torch.no_grad
+ # loss
+ self.ignore_label = config.data.ignore_label
+
+ matcher = hydra.utils.instantiate(config.matcher)
+ weight_dict = {
+ "loss_ce": matcher.cost_class,
+ "loss_mask": matcher.cost_mask,
+ "loss_dice": matcher.cost_dice,
+ }
+
+ aux_weight_dict = {}
+ for i in range(self.model.num_levels * self.model.num_decoders):
+ if i not in self.config.general.ignore_mask_idx:
+ aux_weight_dict.update(
+ {k + f"_{i}": v for k, v in weight_dict.items()}
+ )
+ else:
+ aux_weight_dict.update(
+ {k + f"_{i}": 0.0 for k, v in weight_dict.items()}
+ )
+ weight_dict.update(aux_weight_dict)
+
+ self.preds = dict()
+ self.bbox_preds = dict()
+ self.bbox_gt = dict()
+
+ self.criterion = hydra.utils.instantiate(
+ config.loss, matcher=matcher, weight_dict=weight_dict
+ )
+
+ # metrics
+ self.confusion = hydra.utils.instantiate(config.metrics)
+ self.iou = IoU()
+ # misc
+ self.labels_info = dict()
+
+ def forward(
+ self, x, point2segment=None, raw_coordinates=None, is_eval=False
+ ):
+ with self.optional_freeze():
+ x = self.model(
+ x,
+ point2segment,
+ raw_coordinates=raw_coordinates,
+ is_eval=is_eval,
+ )
+ return x
+
+ def training_step(self, batch, batch_idx):
+ data, target, file_names = batch
+
+ if data.features.shape[0] > self.config.general.max_batch_size:
+ print("data exceeds threshold")
+ raise RuntimeError("BATCH TOO BIG")
+
+ if len(target) == 0:
+ print("no targets")
+ return None
+
+ raw_coordinates = None
+ if self.config.data.add_raw_coordinates:
+ raw_coordinates = data.features[:, -3:]
+ data.features = data.features[:, :-3]
+
+ data = ME.SparseTensor(
+ coordinates=data.coordinates,
+ features=data.features,
+ device=self.device,
+ )
+
+ try:
+ output = self.forward(
+ data,
+ point2segment=[
+ target[i]["point2segment"] for i in range(len(target))
+ ],
+ raw_coordinates=raw_coordinates,
+ )
+ except RuntimeError as run_err:
+ print(run_err)
+ if (
+ "only a single point gives nans in cross-attention"
+ == run_err.args[0]
+ ):
+ return None
+ else:
+ raise run_err
+
+ try:
+ losses = self.criterion(output, target, mask_type=self.mask_type)
+ except ValueError as val_err:
+ print(f"ValueError: {val_err}")
+ print(f"data shape: {data.shape}")
+ print(f"data feat shape: {data.features.shape}")
+ print(f"data feat nans: {data.features.isnan().sum()}")
+ print(f"output: {output}")
+ print(f"target: {target}")
+ print(f"filenames: {file_names}")
+ raise val_err
+
+ for k in list(losses.keys()):
+ if k in self.criterion.weight_dict:
+ losses[k] *= self.criterion.weight_dict[k]
+ else:
+ # remove this loss if not specified in `weight_dict`
+ losses.pop(k)
+
+ logs = {
+ f"train_{k}": v.detach().cpu().item() for k, v in losses.items()
+ }
+
+ logs["train_mean_loss_ce"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_ce" in k]]
+ )
+
+ logs["train_mean_loss_mask"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_mask" in k]]
+ )
+
+ logs["train_mean_loss_dice"] = statistics.mean(
+ [item for item in [v for k, v in logs.items() if "loss_dice" in k]]
+ )
+
+ self.log_dict(logs)
+ return sum(losses.values())
+
+ def validation_step(self, batch, batch_idx):
+ return self.eval_step(batch, batch_idx)
+
+ def export(self, pred_masks, scores, pred_classes, file_names, decoder_id):
+ root_path = f"eval_output"
+ base_path = f"{root_path}/instance_evaluation_{self.config.general.experiment_name}_{self.current_epoch}/decoder_{decoder_id}"
+ pred_mask_path = f"{base_path}/pred_mask"
+
+ Path(pred_mask_path).mkdir(parents=True, exist_ok=True)
+
+ file_name = file_names
+ with open(f"{base_path}/{file_name}.txt", "w") as fout:
+ real_id = -1
+ for instance_id in range(len(pred_classes)):
+ real_id += 1
+ pred_class = pred_classes[instance_id]
+ score = scores[instance_id]
+ mask = pred_masks[:, instance_id].astype("uint8")
+
+ if score > self.config.general.export_threshold:
+ # reduce the export size a bit. I guess no performance difference
+ np.savetxt(
+ f"{pred_mask_path}/{file_name}_{real_id}.txt",
+ mask,
+ fmt="%d",
+ )
+ fout.write(
+ f"pred_mask/{file_name}_{real_id}.txt {pred_class} {score}\n"
+ )
+
+ def training_epoch_end(self, outputs):
+ train_loss = sum([out["loss"].cpu().item() for out in outputs]) / len(
+ outputs
+ )
+ results = {"train_loss_mean": train_loss}
+ self.log_dict(results)
+
+ def validation_epoch_end(self, outputs):
+ self.test_epoch_end(outputs)
+
+ def save_visualizations(
+ self,
+ target_full,
+ full_res_coords,
+ sorted_masks,
+ sort_classes,
+ file_name,
+ original_colors,
+ original_normals,
+ sort_scores_values,
+ point_size=20,
+ sorted_heatmaps=None,
+ query_pos=None,
+ backbone_features=None,
+ ):
+
+ full_res_coords -= full_res_coords.mean(axis=0)
+
+ gt_pcd_pos = []
+ gt_pcd_normals = []
+ gt_pcd_color = []
+ gt_inst_pcd_color = []
+ gt_boxes = []
+
+ if "labels" in target_full:
+ instances_colors = torch.from_numpy(
+ np.vstack(
+ get_evenly_distributed_colors(
+ target_full["labels"].shape[0]
+ )
+ )
+ )
+ for instance_counter, (label, mask) in enumerate(
+ zip(target_full["labels"], target_full["masks"])
+ ):
+ if label == 255:
+ continue
+
+ mask_tmp = mask.detach().cpu().numpy()
+ mask_coords = full_res_coords[mask_tmp.astype(bool), :]
+
+ if len(mask_coords) == 0:
+ continue
+
+ gt_pcd_pos.append(mask_coords)
+ mask_coords_min = full_res_coords[
+ mask_tmp.astype(bool), :
+ ].min(axis=0)
+ mask_coords_max = full_res_coords[
+ mask_tmp.astype(bool), :
+ ].max(axis=0)
+ size = mask_coords_max - mask_coords_min
+ mask_coords_middle = mask_coords_min + size / 2
+
+ gt_boxes.append(
+ {
+ "position": mask_coords_middle,
+ "size": size,
+ "color": self.validation_dataset.map2color([label])[0],
+ }
+ )
+
+ gt_pcd_color.append(
+ self.validation_dataset.map2color([label]).repeat(
+ gt_pcd_pos[-1].shape[0], 1
+ )
+ )
+ gt_inst_pcd_color.append(
+ instances_colors[instance_counter % len(instances_colors)]
+ .unsqueeze(0)
+ .repeat(gt_pcd_pos[-1].shape[0], 1)
+ )
+
+ gt_pcd_normals.append(
+ original_normals[mask_tmp.astype(bool), :]
+ )
+
+ gt_pcd_pos = np.concatenate(gt_pcd_pos)
+ gt_pcd_normals = np.concatenate(gt_pcd_normals)
+ gt_pcd_color = np.concatenate(gt_pcd_color)
+ gt_inst_pcd_color = np.concatenate(gt_inst_pcd_color)
+
+ v = vis.Visualizer()
+
+ v.add_points(
+ "RGB Input",
+ full_res_coords,
+ colors=original_colors,
+ normals=original_normals,
+ visible=True,
+ point_size=point_size,
+ )
+
+ if backbone_features is not None:
+ v.add_points(
+ "PCA",
+ full_res_coords,
+ colors=backbone_features,
+ normals=original_normals,
+ visible=False,
+ point_size=point_size,
+ )
+
+ if "labels" in target_full:
+ v.add_points(
+ "Semantics (GT)",
+ gt_pcd_pos,
+ colors=gt_pcd_color,
+ normals=gt_pcd_normals,
+ alpha=0.8,
+ visible=False,
+ point_size=point_size,
+ )
+ v.add_points(
+ "Instances (GT)",
+ gt_pcd_pos,
+ colors=gt_inst_pcd_color,
+ normals=gt_pcd_normals,
+ alpha=0.8,
+ visible=False,
+ point_size=point_size,
+ )
+
+ pred_coords = []
+ pred_normals = []
+ pred_sem_color = []
+ pred_inst_color = []
+
+ for did in range(len(sorted_masks)):
+ instances_colors = torch.from_numpy(
+ np.vstack(
+ get_evenly_distributed_colors(
+ max(1, sorted_masks[did].shape[1])
+ )
+ )
+ )
+
+ for i in reversed(range(sorted_masks[did].shape[1])):
+ coords = full_res_coords[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+
+ mask_coords = full_res_coords[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+ mask_normals = original_normals[
+ sorted_masks[did][:, i].astype(bool), :
+ ]
+
+ label = sort_classes[did][i]
+
+ if len(mask_coords) == 0:
+ continue
+
+ pred_coords.append(mask_coords)
+ pred_normals.append(mask_normals)
+
+ pred_sem_color.append(
+ self.validation_dataset.map2color([label]).repeat(
+ mask_coords.shape[0], 1
+ )
+ )
+
+ pred_inst_color.append(
+ instances_colors[i % len(instances_colors)]
+ .unsqueeze(0)
+ .repeat(mask_coords.shape[0], 1)
+ )
+
+ if len(pred_coords) > 0:
+ pred_coords = np.concatenate(pred_coords)
+ pred_normals = np.concatenate(pred_normals)
+ pred_sem_color = np.concatenate(pred_sem_color)
+ pred_inst_color = np.concatenate(pred_inst_color)
+
+ v.add_points(
+ "Semantics (Mask3D)",
+ pred_coords,
+ colors=pred_sem_color,
+ normals=pred_normals,
+ visible=False,
+ alpha=0.8,
+ point_size=point_size,
+ )
+ v.add_points(
+ "Instances (Mask3D)",
+ pred_coords,
+ colors=pred_inst_color,
+ normals=pred_normals,
+ visible=False,
+ alpha=0.8,
+ point_size=point_size,
+ )
+
+ v.save(
+ f"{self.config['general']['save_dir']}/visualizations/{file_name}"
+ )
+
+ def eval_step(self, batch, batch_idx):
+ data, target, file_names = batch
+ inverse_maps = data.inverse_maps
+ target_full = data.target_full
+ original_colors = data.original_colors
+ data_idx = data.idx
+ original_normals = data.original_normals
+ original_coordinates = data.original_coordinates
+
+ # if len(target) == 0 or len(target_full) == 0:
+ # print("no targets")
+ # return None
+
+ if len(data.coordinates) == 0:
+ return 0.0
+
+ raw_coordinates = None
+ if self.config.data.add_raw_coordinates:
+ raw_coordinates = data.features[:, -3:]
+ data.features = data.features[:, :-3]
+
+ if raw_coordinates.shape[0] == 0:
+ return 0.0
+
+ data = ME.SparseTensor(
+ coordinates=data.coordinates,
+ features=data.features,
+ device=self.device,
+ )
+
+ try:
+ output = self.forward(
+ data,
+ point2segment=[
+ target[i]["point2segment"] for i in range(len(target))
+ ],
+ raw_coordinates=raw_coordinates,
+ is_eval=True,
+ )
+ except RuntimeError as run_err:
+ print(run_err)
+ if (
+ "only a single point gives nans in cross-attention"
+ == run_err.args[0]
+ ):
+ return None
+ else:
+ raise run_err
+
+ if self.config.data.test_mode != "test":
+ if self.config.trainer.deterministic:
+ torch.use_deterministic_algorithms(False)
+
+ try:
+ losses = self.criterion(
+ output, target, mask_type=self.mask_type
+ )
+ except ValueError as val_err:
+ print(f"ValueError: {val_err}")
+ print(f"data shape: {data.shape}")
+ print(f"data feat shape: {data.features.shape}")
+ print(f"data feat nans: {data.features.isnan().sum()}")
+ print(f"output: {output}")
+ print(f"target: {target}")
+ print(f"filenames: {file_names}")
+ raise val_err
+
+ for k in list(losses.keys()):
+ if k in self.criterion.weight_dict:
+ losses[k] *= self.criterion.weight_dict[k]
+ else:
+ # remove this loss if not specified in `weight_dict`
+ losses.pop(k)
+ if self.config.trainer.deterministic:
+ torch.use_deterministic_algorithms(True)
+
+ if self.config.general.save_visualizations:
+ backbone_features = (
+ output["backbone_features"].F.detach().cpu().numpy()
+ )
+ from sklearn import decomposition
+
+ pca = decomposition.PCA(n_components=3)
+ pca.fit(backbone_features)
+ pca_features = pca.transform(backbone_features)
+ rescaled_pca = (
+ 255
+ * (pca_features - pca_features.min())
+ / (pca_features.max() - pca_features.min())
+ )
+
+ self.eval_instance_step(
+ output,
+ target,
+ target_full,
+ inverse_maps,
+ file_names,
+ original_coordinates,
+ original_colors,
+ original_normals,
+ raw_coordinates,
+ data_idx,
+ backbone_features=rescaled_pca
+ if self.config.general.save_visualizations
+ else None,
+ )
+
+ if self.config.data.test_mode != "test":
+ return {
+ f"val_{k}": v.detach().cpu().item() for k, v in losses.items()
+ }
+ else:
+ return 0.0
+
+ def test_step(self, batch, batch_idx):
+ return self.eval_step(batch, batch_idx)
+
+ def get_full_res_mask(
+ self, mask, inverse_map, point2segment_full, is_heatmap=False
+ ):
+ mask = mask.detach().cpu()[inverse_map] # full res
+
+ if self.eval_on_segments and is_heatmap == False:
+ mask = scatter_mean(
+ mask, point2segment_full, dim=0
+ ) # full res segments
+ mask = (mask > 0.5).float()
+ mask = mask.detach().cpu()[
+ point2segment_full.cpu()
+ ] # full res points
+
+ return mask
+
+ def get_mask_and_scores(
+ self, mask_cls, mask_pred, num_queries=100, num_classes=18, device=None
+ ):
+ if device is None:
+ device = self.device
+ labels = (
+ torch.arange(num_classes, device=device)
+ .unsqueeze(0)
+ .repeat(num_queries, 1)
+ .flatten(0, 1)
+ )
+
+ if self.config.general.topk_per_image != -1:
+ scores_per_query, topk_indices = mask_cls.flatten(0, 1).topk(
+ self.config.general.topk_per_image, sorted=True
+ )
+ else:
+ scores_per_query, topk_indices = mask_cls.flatten(0, 1).topk(
+ num_queries, sorted=True
+ )
+
+ labels_per_query = labels[topk_indices]
+ topk_indices = topk_indices // num_classes
+ mask_pred = mask_pred[:, topk_indices]
+
+ result_pred_mask = (mask_pred > 0).float()
+ heatmap = mask_pred.float().sigmoid()
+
+ mask_scores_per_image = (heatmap * result_pred_mask).sum(0) / (
+ result_pred_mask.sum(0) + 1e-6
+ )
+ score = scores_per_query * mask_scores_per_image
+ classes = labels_per_query
+
+ return score, result_pred_mask, classes, heatmap
+
+ def eval_instance_step(
+ self,
+ output,
+ target_low_res,
+ target_full_res,
+ inverse_maps,
+ file_names,
+ full_res_coords,
+ original_colors,
+ original_normals,
+ raw_coords,
+ idx,
+ first_full_res=False,
+ backbone_features=None,
+ ):
+ label_offset = self.validation_dataset.label_offset
+ prediction = output["aux_outputs"]
+ prediction.append(
+ {
+ "pred_logits": output["pred_logits"],
+ "pred_masks": output["pred_masks"],
+ }
+ )
+
+ prediction[self.decoder_id][
+ "pred_logits"
+ ] = torch.functional.F.softmax(
+ prediction[self.decoder_id]["pred_logits"], dim=-1
+ )[
+ ..., :-1
+ ]
+
+ all_pred_classes = list()
+ all_pred_masks = list()
+ all_pred_scores = list()
+ all_heatmaps = list()
+ all_query_pos = list()
+
+ offset_coords_idx = 0
+ for bid in range(len(prediction[self.decoder_id]["pred_masks"])):
+ if not first_full_res:
+ if self.model.train_on_segments:
+ masks = (
+ prediction[self.decoder_id]["pred_masks"][bid]
+ .detach()
+ .cpu()[target_low_res[bid]["point2segment"].cpu()]
+ )
+ else:
+ masks = (
+ prediction[self.decoder_id]["pred_masks"][bid]
+ .detach()
+ .cpu()
+ )
+
+ if self.config.general.use_dbscan:
+ new_preds = {
+ "pred_masks": list(),
+ "pred_logits": list(),
+ }
+
+ curr_coords_idx = masks.shape[0]
+ curr_coords = raw_coords[
+ offset_coords_idx : curr_coords_idx + offset_coords_idx
+ ]
+ offset_coords_idx += curr_coords_idx
+
+ for curr_query in range(masks.shape[1]):
+ curr_masks = masks[:, curr_query] > 0
+
+ if curr_coords[curr_masks].shape[0] > 0:
+ clusters = (
+ DBSCAN(
+ eps=self.config.general.dbscan_eps,
+ min_samples=self.config.general.dbscan_min_points,
+ n_jobs=-1,
+ )
+ .fit(curr_coords[curr_masks])
+ .labels_
+ )
+
+ new_mask = torch.zeros(curr_masks.shape, dtype=int)
+ new_mask[curr_masks] = (
+ torch.from_numpy(clusters) + 1
+ )
+
+ for cluster_id in np.unique(clusters):
+ original_pred_masks = masks[:, curr_query]
+ if cluster_id != -1:
+ new_preds["pred_masks"].append(
+ original_pred_masks
+ * (new_mask == cluster_id + 1)
+ )
+ new_preds["pred_logits"].append(
+ prediction[self.decoder_id][
+ "pred_logits"
+ ][bid, curr_query]
+ )
+
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ torch.stack(new_preds["pred_logits"]).cpu(),
+ torch.stack(new_preds["pred_masks"]).T,
+ len(new_preds["pred_logits"]),
+ self.model.num_classes - 1,
+ )
+ else:
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ prediction[self.decoder_id]["pred_logits"][bid]
+ .detach()
+ .cpu(),
+ masks,
+ prediction[self.decoder_id]["pred_logits"][bid].shape[
+ 0
+ ],
+ self.model.num_classes - 1,
+ )
+
+ masks = self.get_full_res_mask(
+ masks,
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ )
+
+ heatmap = self.get_full_res_mask(
+ heatmap,
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ is_heatmap=True,
+ )
+
+ if backbone_features is not None:
+ backbone_features = self.get_full_res_mask(
+ torch.from_numpy(backbone_features),
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ is_heatmap=True,
+ )
+ backbone_features = backbone_features.numpy()
+ else:
+ assert False, "not tested"
+ masks = self.get_full_res_mask(
+ prediction[self.decoder_id]["pred_masks"][bid].cpu(),
+ inverse_maps[bid],
+ target_full_res[bid]["point2segment"],
+ )
+
+ scores, masks, classes, heatmap = self.get_mask_and_scores(
+ prediction[self.decoder_id]["pred_logits"][bid].cpu(),
+ masks,
+ prediction[self.decoder_id]["pred_logits"][bid].shape[0],
+ self.model.num_classes - 1,
+ device="cpu",
+ )
+
+ masks = masks.numpy()
+ heatmap = heatmap.numpy()
+
+ sort_scores = scores.sort(descending=True)
+ sort_scores_index = sort_scores.indices.cpu().numpy()
+ sort_scores_values = sort_scores.values.cpu().numpy()
+ sort_classes = classes[sort_scores_index]
+
+ sorted_masks = masks[:, sort_scores_index]
+ sorted_heatmap = heatmap[:, sort_scores_index]
+
+ if self.config.general.filter_out_instances:
+ keep_instances = set()
+ pairwise_overlap = sorted_masks.T @ sorted_masks
+ normalization = pairwise_overlap.max(axis=0)
+ norm_overlaps = pairwise_overlap / normalization
+
+ for instance_id in range(norm_overlaps.shape[0]):
+ # filter out unlikely masks and nearly empty masks
+ # if not(sort_scores_values[instance_id] < 0.3 or sorted_masks[:, instance_id].sum() < 500):
+ if not (
+ sort_scores_values[instance_id]
+ < self.config.general.scores_threshold
+ ):
+ # check if mask != empty
+ if not sorted_masks[:, instance_id].sum() == 0.0:
+ overlap_ids = set(
+ np.nonzero(
+ norm_overlaps[instance_id, :]
+ > self.config.general.iou_threshold
+ )[0]
+ )
+
+ if len(overlap_ids) == 0:
+ keep_instances.add(instance_id)
+ else:
+ if instance_id == min(overlap_ids):
+ keep_instances.add(instance_id)
+
+ keep_instances = sorted(list(keep_instances))
+ all_pred_classes.append(sort_classes[keep_instances])
+ all_pred_masks.append(sorted_masks[:, keep_instances])
+ all_pred_scores.append(sort_scores_values[keep_instances])
+ all_heatmaps.append(sorted_heatmap[:, keep_instances])
+ else:
+ all_pred_classes.append(sort_classes)
+ all_pred_masks.append(sorted_masks)
+ all_pred_scores.append(sort_scores_values)
+ all_heatmaps.append(sorted_heatmap)
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ all_pred_classes[bid][all_pred_classes[bid] == 0] = -1
+ if self.config.data.test_mode != "test":
+ target_full_res[bid]["labels"][
+ target_full_res[bid]["labels"] == 0
+ ] = -1
+
+ for bid in range(len(prediction[self.decoder_id]["pred_masks"])):
+ all_pred_classes[
+ bid
+ ] = self.validation_dataset._remap_model_output(
+ all_pred_classes[bid].cpu() + label_offset
+ )
+
+ if (
+ self.config.data.test_mode != "test"
+ and len(target_full_res) != 0
+ ):
+ target_full_res[bid][
+ "labels"
+ ] = self.validation_dataset._remap_model_output(
+ target_full_res[bid]["labels"].cpu() + label_offset
+ )
+
+ # PREDICTION BOX
+ bbox_data = []
+ for query_id in range(
+ all_pred_masks[bid].shape[1]
+ ): # self.model.num_queries
+ obj_coords = full_res_coords[bid][
+ all_pred_masks[bid][:, query_id].astype(bool), :
+ ]
+ if obj_coords.shape[0] > 0:
+ obj_center = obj_coords.mean(axis=0)
+ obj_axis_length = obj_coords.max(
+ axis=0
+ ) - obj_coords.min(axis=0)
+
+ bbox = np.concatenate((obj_center, obj_axis_length))
+
+ bbox_data.append(
+ (
+ all_pred_classes[bid][query_id].item(),
+ bbox,
+ all_pred_scores[bid][query_id],
+ )
+ )
+ self.bbox_preds[file_names[bid]] = bbox_data
+
+ # GT BOX
+ bbox_data = []
+ for obj_id in range(target_full_res[bid]["masks"].shape[0]):
+ if target_full_res[bid]["labels"][obj_id].item() == 255:
+ continue
+
+ obj_coords = full_res_coords[bid][
+ target_full_res[bid]["masks"][obj_id, :]
+ .cpu()
+ .detach()
+ .numpy()
+ .astype(bool),
+ :,
+ ]
+ if obj_coords.shape[0] > 0:
+ obj_center = obj_coords.mean(axis=0)
+ obj_axis_length = obj_coords.max(
+ axis=0
+ ) - obj_coords.min(axis=0)
+
+ bbox = np.concatenate((obj_center, obj_axis_length))
+ bbox_data.append(
+ (
+ target_full_res[bid]["labels"][obj_id].item(),
+ bbox,
+ )
+ )
+
+ self.bbox_gt[file_names[bid]] = bbox_data
+
+ if self.config.general.eval_inner_core == -1:
+ self.preds[file_names[bid]] = {
+ "pred_masks": all_pred_masks[bid],
+ "pred_scores": all_pred_scores[bid],
+ "pred_classes": all_pred_classes[bid],
+ }
+ else:
+ # prev val_dataset
+ self.preds[file_names[bid]] = {
+ "pred_masks": all_pred_masks[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ "pred_scores": all_pred_scores[bid],
+ "pred_classes": all_pred_classes[bid],
+ }
+
+ if self.config.general.save_visualizations:
+ if "cond_inner" in self.test_dataset.data[idx[bid]]:
+ target_full_res[bid]["masks"] = target_full_res[bid][
+ "masks"
+ ][:, self.test_dataset.data[idx[bid]]["cond_inner"]]
+ self.save_visualizations(
+ target_full_res[bid],
+ full_res_coords[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ [self.preds[file_names[bid]]["pred_masks"]],
+ [self.preds[file_names[bid]]["pred_classes"]],
+ file_names[bid],
+ original_colors[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ original_normals[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ [self.preds[file_names[bid]]["pred_scores"]],
+ sorted_heatmaps=[
+ all_heatmaps[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ]
+ ],
+ query_pos=all_query_pos[bid][
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ]
+ if len(all_query_pos) > 0
+ else None,
+ backbone_features=backbone_features[
+ self.test_dataset.data[idx[bid]]["cond_inner"]
+ ],
+ point_size=self.config.general.visualization_point_size,
+ )
+ else:
+ self.save_visualizations(
+ target_full_res[bid],
+ full_res_coords[bid],
+ [self.preds[file_names[bid]]["pred_masks"]],
+ [self.preds[file_names[bid]]["pred_classes"]],
+ file_names[bid],
+ original_colors[bid],
+ original_normals[bid],
+ [self.preds[file_names[bid]]["pred_scores"]],
+ sorted_heatmaps=[all_heatmaps[bid]],
+ query_pos=all_query_pos[bid]
+ if len(all_query_pos) > 0
+ else None,
+ backbone_features=backbone_features,
+ point_size=self.config.general.visualization_point_size,
+ )
+
+ if self.config.general.export:
+ if self.validation_dataset.dataset_name == "stpls3d":
+ scan_id, _, _, crop_id = file_names[bid].split("_")
+ crop_id = int(crop_id.replace(".txt", ""))
+ file_name = (
+ f"{scan_id}_points_GTv3_0{crop_id}_inst_nostuff"
+ )
+
+ self.export(
+ self.preds[file_names[bid]]["pred_masks"],
+ self.preds[file_names[bid]]["pred_scores"],
+ self.preds[file_names[bid]]["pred_classes"],
+ file_name,
+ self.decoder_id,
+ )
+ else:
+ self.export(
+ self.preds[file_names[bid]]["pred_masks"],
+ self.preds[file_names[bid]]["pred_scores"],
+ self.preds[file_names[bid]]["pred_classes"],
+ file_names[bid],
+ self.decoder_id,
+ )
+
+ def eval_instance_epoch_end(self):
+ log_prefix = f"val"
+ ap_results = {}
+
+ head_results, tail_results, common_results = [], [], []
+
+ box_ap_50 = eval_det(
+ self.bbox_preds, self.bbox_gt, ovthresh=0.5, use_07_metric=False
+ )
+ box_ap_25 = eval_det(
+ self.bbox_preds, self.bbox_gt, ovthresh=0.25, use_07_metric=False
+ )
+ mean_box_ap_25 = sum([v for k, v in box_ap_25[-1].items()]) / len(
+ box_ap_25[-1].keys()
+ )
+ mean_box_ap_50 = sum([v for k, v in box_ap_50[-1].items()]) / len(
+ box_ap_50[-1].keys()
+ )
+
+ ap_results[f"{log_prefix}_mean_box_ap_25"] = mean_box_ap_25
+ ap_results[f"{log_prefix}_mean_box_ap_50"] = mean_box_ap_50
+
+ for class_id in box_ap_50[-1].keys():
+ class_name = self.train_dataset.label_info[class_id]["name"]
+ ap_results[f"{log_prefix}_{class_name}_val_box_ap_50"] = box_ap_50[
+ -1
+ ][class_id]
+
+ for class_id in box_ap_25[-1].keys():
+ class_name = self.train_dataset.label_info[class_id]["name"]
+ ap_results[f"{log_prefix}_{class_name}_val_box_ap_25"] = box_ap_25[
+ -1
+ ][class_id]
+
+ root_path = f"eval_output"
+ base_path = f"{root_path}/instance_evaluation_{self.config.general.experiment_name}_{self.current_epoch}"
+
+ if self.validation_dataset.dataset_name in [
+ "scannet",
+ "stpls3d",
+ "scannet200",
+ ]:
+ gt_data_path = f"{self.validation_dataset.data_dir[0]}/instance_gt/{self.validation_dataset.mode}"
+ else:
+ gt_data_path = f"{self.validation_dataset.data_dir[0]}/instance_gt/Area_{self.config.general.area}"
+
+ pred_path = f"{base_path}/tmp_output.txt"
+
+ log_prefix = f"val"
+
+ if not os.path.exists(base_path):
+ os.makedirs(base_path)
+
+ try:
+ if self.validation_dataset.dataset_name == "s3dis":
+ new_preds = {}
+ for key in self.preds.keys():
+ new_preds[
+ key.replace(f"Area_{self.config.general.area}_", "")
+ ] = {
+ "pred_classes": self.preds[key]["pred_classes"] + 1,
+ "pred_masks": self.preds[key]["pred_masks"],
+ "pred_scores": self.preds[key]["pred_scores"],
+ }
+ mprec, mrec = evaluate(
+ new_preds, gt_data_path, pred_path, dataset="s3dis"
+ )
+ ap_results[f"{log_prefix}_mean_precision"] = mprec
+ ap_results[f"{log_prefix}_mean_recall"] = mrec
+ elif self.validation_dataset.dataset_name == "stpls3d":
+ new_preds = {}
+ for key in self.preds.keys():
+ new_preds[key.replace(".txt", "")] = {
+ "pred_classes": self.preds[key]["pred_classes"],
+ "pred_masks": self.preds[key]["pred_masks"],
+ "pred_scores": self.preds[key]["pred_scores"],
+ }
+
+ evaluate(new_preds, gt_data_path, pred_path, dataset="stpls3d")
+ else:
+ evaluate(
+ self.preds,
+ gt_data_path,
+ pred_path,
+ dataset=self.validation_dataset.dataset_name,
+ )
+ with open(pred_path, "r") as fin:
+ for line_id, line in enumerate(fin):
+ if line_id == 0:
+ # ignore header
+ continue
+ class_name, _, ap, ap_50, ap_25 = line.strip().split(",")
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ if class_name in VALID_CLASS_IDS_200_VALIDATION:
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap"
+ ] = float(ap)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_50"
+ ] = float(ap_50)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_25"
+ ] = float(ap_25)
+
+ if class_name in HEAD_CATS_SCANNET_200:
+ head_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ elif class_name in COMMON_CATS_SCANNET_200:
+ common_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ elif class_name in TAIL_CATS_SCANNET_200:
+ tail_results.append(
+ np.array(
+ (float(ap), float(ap_50), float(ap_25))
+ )
+ )
+ else:
+ assert (False, "class not known!")
+ else:
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap"
+ ] = float(ap)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_50"
+ ] = float(ap_50)
+ ap_results[
+ f"{log_prefix}_{class_name}_val_ap_25"
+ ] = float(ap_25)
+
+ if self.validation_dataset.dataset_name == "scannet200":
+ head_results = np.stack(head_results)
+ common_results = np.stack(common_results)
+ tail_results = np.stack(tail_results)
+
+ mean_tail_results = np.nanmean(tail_results, axis=0)
+ mean_common_results = np.nanmean(common_results, axis=0)
+ mean_head_results = np.nanmean(head_results, axis=0)
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_25"
+ ] = mean_tail_results[0]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_25"
+ ] = mean_common_results[0]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_25"
+ ] = mean_head_results[0]
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_50"
+ ] = mean_tail_results[1]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_50"
+ ] = mean_common_results[1]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_50"
+ ] = mean_head_results[1]
+
+ ap_results[
+ f"{log_prefix}_mean_tail_ap_25"
+ ] = mean_tail_results[2]
+ ap_results[
+ f"{log_prefix}_mean_common_ap_25"
+ ] = mean_common_results[2]
+ ap_results[
+ f"{log_prefix}_mean_head_ap_25"
+ ] = mean_head_results[2]
+
+ overall_ap_results = np.nanmean(
+ np.vstack((head_results, common_results, tail_results)),
+ axis=0,
+ )
+
+ ap_results[f"{log_prefix}_mean_ap"] = overall_ap_results[0]
+ ap_results[f"{log_prefix}_mean_ap_50"] = overall_ap_results[1]
+ ap_results[f"{log_prefix}_mean_ap_25"] = overall_ap_results[2]
+
+ ap_results = {
+ key: 0.0 if math.isnan(score) else score
+ for key, score in ap_results.items()
+ }
+ else:
+ mean_ap = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap")
+ ]
+ )
+ mean_ap_50 = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap_50")
+ ]
+ )
+ mean_ap_25 = statistics.mean(
+ [
+ item
+ for key, item in ap_results.items()
+ if key.endswith("val_ap_25")
+ ]
+ )
+
+ ap_results[f"{log_prefix}_mean_ap"] = mean_ap
+ ap_results[f"{log_prefix}_mean_ap_50"] = mean_ap_50
+ ap_results[f"{log_prefix}_mean_ap_25"] = mean_ap_25
+
+ ap_results = {
+ key: 0.0 if math.isnan(score) else score
+ for key, score in ap_results.items()
+ }
+ except (IndexError, OSError) as e:
+ print("NO SCORES!!!")
+ ap_results[f"{log_prefix}_mean_ap"] = 0.0
+ ap_results[f"{log_prefix}_mean_ap_50"] = 0.0
+ ap_results[f"{log_prefix}_mean_ap_25"] = 0.0
+
+ self.log_dict(ap_results)
+
+ if not self.config.general.export:
+ shutil.rmtree(base_path)
+
+ del self.preds
+ del self.bbox_preds
+ del self.bbox_gt
+
+ gc.collect()
+
+ self.preds = dict()
+ self.bbox_preds = dict()
+ self.bbox_gt = dict()
+
+ def test_epoch_end(self, outputs):
+ if self.config.general.export:
+ return
+
+ self.eval_instance_epoch_end()
+
+ dd = defaultdict(list)
+ for output in outputs:
+ for key, val in output.items(): # .items() in Python 3.
+ dd[key].append(val)
+
+ dd = {k: statistics.mean(v) for k, v in dd.items()}
+
+ dd["val_mean_loss_ce"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_ce" in k]]
+ )
+ dd["val_mean_loss_mask"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_mask" in k]]
+ )
+ dd["val_mean_loss_dice"] = statistics.mean(
+ [item for item in [v for k, v in dd.items() if "loss_dice" in k]]
+ )
+
+ self.log_dict(dd)
+
+ def configure_optimizers(self):
+ optimizer = hydra.utils.instantiate(
+ self.config.optimizer, params=self.parameters()
+ )
+ if "steps_per_epoch" in self.config.scheduler.scheduler.keys():
+ self.config.scheduler.scheduler.steps_per_epoch = len(
+ self.train_dataloader()
+ )
+ lr_scheduler = hydra.utils.instantiate(
+ self.config.scheduler.scheduler, optimizer=optimizer
+ )
+ scheduler_config = {"scheduler": lr_scheduler}
+ scheduler_config.update(self.config.scheduler.pytorch_lightning_params)
+ return [optimizer], [scheduler_config]
+
+ def prepare_data(self):
+ self.train_dataset = hydra.utils.instantiate(
+ self.config.data.train_dataset
+ )
+ self.validation_dataset = hydra.utils.instantiate(
+ self.config.data.validation_dataset
+ )
+ self.test_dataset = hydra.utils.instantiate(
+ self.config.data.test_dataset
+ )
+ self.labels_info = self.train_dataset.label_info
+
+ def train_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.train_collation)
+ return hydra.utils.instantiate(
+ self.config.data.train_dataloader,
+ self.train_dataset,
+ collate_fn=c_fn,
+ )
+
+ def val_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.validation_collation)
+ return hydra.utils.instantiate(
+ self.config.data.validation_dataloader,
+ self.validation_dataset,
+ collate_fn=c_fn,
+ )
+
+ def test_dataloader(self):
+ c_fn = hydra.utils.instantiate(self.config.data.test_collation)
+ return hydra.utils.instantiate(
+ self.config.data.test_dataloader,
+ self.test_dataset,
+ collate_fn=c_fn,
+ )
diff --git a/models/Mask3D/mask3d/utils/__init__.py b/models/Mask3D/mask3d/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/models/Mask3D/mask3d/utils/gradflow_check.py b/models/Mask3D/mask3d/utils/gradflow_check.py
new file mode 100644
index 0000000000000000000000000000000000000000..2fedc91592d66d4e5bdef7531daafccc5b5f2e81
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/gradflow_check.py
@@ -0,0 +1,62 @@
+""" https://github.com/alwynmathew/gradflow-check """
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.lines import Line2D
+
+
+def plot_grad_flow(named_parameters):
+ ave_grads = []
+ layers = []
+ for n, p in named_parameters:
+ if (p.requires_grad) and ("bias" not in n):
+ if p.grad:
+ layers.append(n)
+ ave_grads.append(p.grad.abs().mean())
+ else:
+ print(f"{n} - doesn't have gradient computed")
+
+ plt.plot(ave_grads, alpha=0.3, color="b")
+ plt.hlines(0, 0, len(ave_grads) + 1, linewidth=1, color="k")
+ plt.xticks(range(0, len(ave_grads), 1), layers, rotation="vertical")
+ plt.xlim(xmin=0, xmax=len(ave_grads))
+ plt.xlabel("Layers")
+ plt.ylabel("average gradient")
+ plt.title("Gradient flow")
+ plt.grid(True)
+
+
+def plot_grad_flow_v2(named_parameters):
+ """Plots the gradients flowing through different layers in the net during training.
+ Can be used for checking for possible gradient vanishing / exploding problems.
+
+ Usage: Plug this function in Trainer class after loss.backwards() as
+ "plot_grad_flow(self.model.named_parameters())" to visualize the gradient flow"""
+ ave_grads = []
+ max_grads = []
+ layers = []
+ for n, p in named_parameters:
+ if (p.requires_grad) and ("bias" not in n):
+ layers.append(n)
+ if p.grad:
+ ave_grads.append(p.grad.abs().mean())
+ max_grads.append(p.grad.abs().max())
+ else:
+ print(f"{n} - doesn't have gradient computed")
+ plt.bar(np.arange(len(max_grads)), max_grads, alpha=0.1, lw=1, color="c")
+ plt.bar(np.arange(len(max_grads)), ave_grads, alpha=0.1, lw=1, color="b")
+ plt.hlines(0, 0, len(ave_grads) + 1, lw=2, color="k")
+ plt.xticks(range(0, len(ave_grads), 1), layers, rotation="vertical")
+ plt.xlim(left=0, right=len(ave_grads))
+ plt.ylim(bottom=-0.001, top=0.02) # zoom in on the lower gradient regions
+ plt.xlabel("Layers")
+ plt.ylabel("average gradient")
+ plt.title("Gradient flow")
+ plt.grid(True)
+ plt.legend(
+ [
+ Line2D([0], [0], color="c", lw=4),
+ Line2D([0], [0], color="b", lw=4),
+ Line2D([0], [0], color="k", lw=4),
+ ],
+ ["max-gradient", "mean-gradient", "zero-gradient"],
+ )
diff --git a/models/Mask3D/mask3d/utils/kfold.py b/models/Mask3D/mask3d/utils/kfold.py
new file mode 100644
index 0000000000000000000000000000000000000000..5bfeba130c890eec35530adeb23f1362041f7cdc
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/kfold.py
@@ -0,0 +1,89 @@
+""" Author: https://github.com/yk-szk/stratified_group_kfold """
+import random
+import numpy as np
+
+
+class StratifiedGroupKFold:
+ """
+ Stratified Group K-fold with sklearn.model_selection.KFold compabitility.
+
+ Split dataset into k folds with balanced label distribution (stratified) and non-overlapping group.
+
+ Args:
+ n_splits (int): # of splits
+ shuffle (bool): Shuffle
+ seed (int): Seed value for random number generator
+ """
+
+ def __init__(self, n_splits, shuffle=True, random_state=None):
+ self.n_splits = n_splits
+ self.shuffle = shuffle
+ self.seed = random_state
+
+ def split(self, X, labels, groups):
+ assert len(X) == len(labels) == len(groups), "Invalid input length"
+ assert (
+ len(set(groups)) >= self.n_splits
+ ), "The number of groups needs to be larger than n_splits"
+
+ def encode(v):
+ s = set(v)
+ d = {l: i for i, l in enumerate(s)}
+ return [d[e] for e in v]
+
+ labels, groups = encode(labels), encode(groups)
+ num_labels, num_groups = max(labels) + 1, max(groups) + 1
+ label_counts_per_group = np.zeros((num_groups, num_labels), dtype=int)
+ global_label_dist = np.bincount(labels)
+ for label, g in zip(labels, groups):
+ label_counts_per_group[g][label] += 1
+
+ label_counts_per_fold = np.zeros(
+ (self.n_splits, num_labels), dtype=int
+ )
+ groups_per_fold = [set() for _ in range(self.n_splits)]
+
+ def eval_label_counts_per_fold(y_counts, fold):
+ fold += y_counts
+ std_per_label = (
+ np.std(label_counts_per_fold, axis=0) / global_label_dist
+ )
+ fold -= y_counts
+ return np.mean(std_per_label)
+
+ groups_and_label_counts = list(enumerate(label_counts_per_group))
+ if self.shuffle:
+ rng = random.Random(self.seed)
+ mean_std = np.mean(np.std(label_counts_per_group, axis=1))
+ groups_and_label_counts.sort(
+ key=lambda g_counts: -np.std(g_counts[1])
+ + rng.gauss(0, mean_std)
+ ) # add rng.gauss to increase the randomness
+ else:
+ groups_and_label_counts.sort(
+ key=lambda g_counts: -np.std(g_counts[1])
+ )
+
+ for g, label_counts in groups_and_label_counts:
+ evals = [
+ eval_label_counts_per_fold(
+ label_counts, label_counts_per_fold[i]
+ )
+ for i in range(self.n_splits)
+ ]
+ best_fold = np.argmin(evals)
+ label_counts_per_fold[best_fold] += label_counts
+ groups_per_fold[best_fold].add(g)
+
+ all_groups = set(groups)
+ for test_groups in groups_per_fold:
+ train_groups = all_groups - test_groups
+
+ train_indices = [
+ i for i, g in enumerate(groups) if g in train_groups
+ ]
+ test_indices = [
+ i for i, g in enumerate(groups) if g in test_groups
+ ]
+
+ yield train_indices, test_indices
diff --git a/models/Mask3D/mask3d/utils/pc_visualizations.py b/models/Mask3D/mask3d/utils/pc_visualizations.py
new file mode 100644
index 0000000000000000000000000000000000000000..26937b9f293f9cc2b87cc67d3c8742c80f770d60
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pc_visualizations.py
@@ -0,0 +1,202 @@
+from io import BytesIO
+from imageio import imread
+
+import open3d as o3d
+from PIL import Image
+import numpy as np
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+from pandas import DataFrame
+import matplotlib
+import seaborn as sns
+import pyviz3d.visualizer as viz
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+
+
+def point_cloud_plolty(
+ coordinates,
+ label_color,
+ label_text,
+ prediction_color,
+ prediction_text,
+ normals,
+):
+ def draw_point_cloud(coords, colors=None, label_text=None):
+ marker = dict(size=1, opacity=0.8)
+ if colors is not None:
+ marker.update({"color": colors})
+ if (colors is None) and (label_text is not None):
+ marker.update({"color": label_text})
+ fig = go.Scatter3d(
+ x=coords[:, 0],
+ y=coords[:, 1],
+ z=coords[:, 2],
+ text=label_text,
+ mode="markers",
+ marker=marker,
+ )
+ return fig
+
+ fig = make_subplots(
+ rows=1,
+ cols=2,
+ specs=[[{"type": "scatter3d"}, {"type": "scatter3d"}]],
+ )
+ fig.add_trace(
+ draw_point_cloud(coordinates, prediction_color, prediction_text),
+ row=1,
+ col=1,
+ )
+ # adding image with prediction
+ fig.add_trace(
+ draw_point_cloud(coordinates, label_color, label_text), row=1, col=2
+ )
+ fig.show()
+ # data = fig.to_image(width=1080, height=720, format="png")
+ # image = Image.open(BytesIO(data))
+ # return image
+
+
+def point_cloud_pyviz3d(
+ name,
+ coordinates,
+ path,
+ color=None,
+ normals=None,
+ label_color=None,
+ prediction_color=None,
+ point_size=25,
+ voxel_size=0.01,
+):
+
+ # because of visualization
+ coordinates = coordinates * voxel_size
+ # First, we set up a visualizer
+ visualizer = viz.Visualizer()
+ if label_color is not None:
+ visualizer.add_points(
+ name=f"{name}_label",
+ positions=coordinates,
+ colors=label_color,
+ point_size=point_size,
+ visible=False,
+ )
+
+ if prediction_color is not None:
+ visualizer.add_points(
+ name=f"{name}_prediction",
+ positions=coordinates,
+ colors=prediction_color,
+ point_size=point_size,
+ visible=False,
+ )
+
+ visualizer.add_points(
+ name=name,
+ positions=coordinates,
+ colors=color,
+ normals=normals,
+ point_size=point_size,
+ visible=False,
+ )
+ # When we added everything we need to the visualizer, we save it.
+ visualizer.save(path, verbose=False)
+
+
+def point_cloud_open3d(coordinates):
+ points = o3d.geometry.PointCloud(o3d.utility.Vector3dVector(coordinates))
+ o3d.visualization.draw_geometries([points])
+
+
+def _remap_model_output(output, labels):
+ output = np.array(output)
+ output_remapped = output.copy()
+ for i, k in enumerate(labels.keys()):
+ output_remapped[output == i] = k
+ return output_remapped
+
+
+def save_visualization(
+ coordinates,
+ name="none",
+ color=None,
+ normals=None,
+ target=None,
+ prediction=None,
+ target_info=None,
+ path="./saved",
+ backend="pyviz3d",
+ voxel_size=0.05,
+ color_mean=[0.47793125906962, 0.4303257521323044, 0.3749598901421883],
+ color_std=[0.2834475483823543, 0.27566157565723015, 0.27018971370874995],
+):
+ target = _remap_model_output(target, target_info)
+ prediction = _remap_model_output(prediction, target_info)
+ coordinates = coordinates[:, :3] - coordinates[:, :3].mean(axis=0)
+ coordinates = coordinates * voxel_size
+ if color is not None:
+ color = (color * color_std + color_mean) * 255
+
+ target_color = np.zeros((len(target), 3))
+ target_text = np.full((len(target)), "empty")
+ prediction_color = np.zeros((len(prediction), 3))
+ prediction_text = np.full((len(prediction)), "empty")
+ if target_info is not None:
+ for k, v in target_info.items():
+ target_color[target == k] = v["color"]
+ target_text[target == k] = v["name"]
+ prediction_color[prediction == k] = v["color"]
+ prediction_text[prediction == k] = v["name"]
+ if backend == "pyviz3d":
+ point_cloud_pyviz3d(
+ name=name,
+ coordinates=coordinates,
+ path=path,
+ color=color,
+ normals=normals,
+ label_color=target_color,
+ prediction_color=prediction_color,
+ voxel_size=1,
+ )
+ elif backend == "plotly":
+ point_cloud_plolty(
+ coordinates=coordinates,
+ normals=normals,
+ label_color=target_color,
+ label_text=target_text,
+ prediction_color=prediction_color,
+ prediction_text=prediction_text,
+ )
+ elif backend == "open3d":
+ point_cloud_open3d(coordinates)
+ else:
+ print("No such backend")
+
+
+def draw_confsion_matrix(confusion_matrix, label_db):
+ index = [i for i in range(confusion_matrix.shape[0])]
+ index = _remap_model_output(index, label_db)
+ column_names = np.full((len(index)), "empty")
+ for k, v in label_db.items():
+ column_names[index == k] = v["name"]
+ df_cm = DataFrame(
+ confusion_matrix, index=column_names, columns=column_names
+ )
+ # pretty_plot_confusion_matrix(df_cm, fz=9)
+ sns.heatmap(
+ df_cm,
+ annot=True,
+ fmt="d",
+ linewidths=0.25,
+ annot_kws={"size": 5},
+ vmax=10000,
+ )
+ buf = BytesIO()
+ plt.savefig(buf, format="jpg")
+ plt.close()
+ buf.seek(0)
+ image = imread(buf, format="jpg")
+ buf.close()
+ return image
diff --git a/models/Mask3D/mask3d/utils/point_cloud_utils.py b/models/Mask3D/mask3d/utils/point_cloud_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d2b5ec875da78d299c23afa70531cb0df04e278
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/point_cloud_utils.py
@@ -0,0 +1,83 @@
+from pathlib import Path
+from typing import List, Optional, Tuple
+
+import numpy as np
+import open3d
+from plyfile import PlyData, PlyElement
+
+
+def load_ply(filepath):
+ with open(filepath, "rb") as f:
+ plydata = PlyData.read(f)
+ data = plydata.elements[0].data
+ coords = np.array([data["x"], data["y"], data["z"]], dtype=np.float32).T
+ feats = None
+ labels = None
+ if ({"red", "green", "blue"} - set(data.dtype.names)) == set():
+ feats = np.array(
+ [data["red"], data["green"], data["blue"]], dtype=np.uint8
+ ).T
+ if "label" in data.dtype.names:
+ labels = np.array(data["label"], dtype=np.uint32)
+ return coords, feats, labels
+
+
+def load_ply_with_normals(filepath):
+ mesh = open3d.io.read_triangle_mesh(str(filepath))
+ if not mesh.has_vertex_normals():
+ mesh.compute_vertex_normals()
+ vertices = np.asarray(mesh.vertices)
+ normals = np.asarray(mesh.vertex_normals)
+
+ coords, feats, labels = load_ply(filepath)
+ assert np.allclose(coords, vertices), "different coordinates"
+ feats = np.hstack((feats, normals))
+
+ return coords, feats, labels
+
+
+def load_obj_with_normals(filepath):
+ mesh = open3d.io.read_triangle_mesh(str(filepath))
+ if not mesh.has_vertex_normals():
+ mesh.compute_vertex_normals()
+ coords = np.asarray(mesh.vertices)
+ normals = np.asarray(mesh.vertex_normals)
+ colors = np.asarray(mesh.vertex_colors)
+ feats = np.hstack((colors, normals))
+
+ return coords, feats
+
+
+def write_point_cloud_in_ply(
+ filepath: Path,
+ coords: np.ndarray,
+ feats: Optional[np.ndarray] = None,
+ labels: Optional[np.ndarray] = None,
+ dtypes: Optional[List[Tuple[str, str]]] = [
+ ("x", "
+#include
+#include
+#include
+#include "aggregation_cuda_kernel.h"
+
+
+void aggregation_forward_cuda(int n, int nsample, int c, int w_c, at::Tensor input_tensor, at::Tensor position_tensor, at::Tensor weight_tensor, at::Tensor idx_tensor, at::Tensor output_tensor)
+{
+ const float *input = input_tensor.data_ptr();
+ const float *position = position_tensor.data_ptr();
+ const float *weight = weight_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ aggregation_forward_cuda_launcher(n, nsample, c, w_c, input, position, weight, idx, output);
+}
+
+void aggregation_backward_cuda(int n, int nsample, int c, int w_c, at::Tensor input_tensor, at::Tensor position_tensor, at::Tensor weight_tensor, at::Tensor idx_tensor, at::Tensor grad_output_tensor, at::Tensor grad_input_tensor, at::Tensor grad_position_tensor, at::Tensor grad_weight_tensor)
+{
+ const float *input = input_tensor.data_ptr();
+ const float *position = position_tensor.data_ptr();
+ const float *weight = weight_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ const float *grad_output = grad_output_tensor.data_ptr();
+ float *grad_input = grad_input_tensor.data_ptr();
+ float *grad_position = grad_position_tensor.data_ptr();
+ float *grad_weight = grad_weight_tensor.data_ptr();
+ aggregation_backward_cuda_launcher(n, nsample, c, w_c, input, position, weight, idx, grad_output, grad_input, grad_position, grad_weight);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..8339bb7e2088abffefba02c26b248edafed6cf47
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.cu
@@ -0,0 +1,53 @@
+#include "../cuda_utils.h"
+#include "aggregation_cuda_kernel.h"
+
+
+__global__ void aggregation_forward_cuda_kernel(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, float *output) {
+ // input: input: (n, c), position: (n, nsample, c), weight: (n, nsample, w_c), idx: (n, nsample), output: (n, c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= n * c) return;
+ const int c_idx = index % c;
+ const int n_idx = index / c;
+ const int w_c_idx = c_idx % w_c;
+ for (int nsample_idx = 0; nsample_idx < nsample; nsample_idx++)
+ {
+ int idx_idx = n_idx * nsample + nsample_idx;
+ int input_idx = idx[idx_idx] * c + c_idx;
+ int position_idx = n_idx * nsample * c + nsample_idx * c + c_idx;
+ int weight_idx = n_idx * nsample * w_c + nsample_idx * w_c + w_c_idx;
+ output[index] += (input[input_idx] + position[position_idx]) * weight[weight_idx];
+ }
+}
+
+__global__ void aggregation_backward_cuda_kernel(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, const float *grad_output, float *grad_input, float *grad_position, float *grad_weight) {
+ // input: grad_output: (n, c), output: grad_input: (n, c), grad_position: (n, nsample, c), grad_weight: (n, nsample, w_c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= n * c) return;
+ const int c_idx = index % c;
+ const int n_idx = index / c;
+ const int w_c_idx = c_idx % w_c;
+ for (int nsample_idx = 0; nsample_idx < nsample; nsample_idx++)
+ {
+ int idx_idx = n_idx * nsample + nsample_idx;
+ int input_idx = idx[idx_idx] * c + c_idx;
+ int position_idx = n_idx * nsample * c + nsample_idx * c + c_idx;
+ int weight_idx = n_idx * nsample * w_c + nsample_idx * w_c + w_c_idx;
+ atomicAdd(grad_input + input_idx, grad_output[index] * weight[weight_idx]);
+ grad_position[position_idx] = grad_output[index] * weight[weight_idx];
+ atomicAdd(grad_weight + weight_idx, grad_output[index] * (input[input_idx] + position[position_idx]));
+ }
+}
+
+void aggregation_forward_cuda_launcher(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, float *output) {
+ // input: input: (n, c), position: (n, nsample, c), weight: (n, nsample, w_c), idx: (n, nsample), output: (n, c)
+ dim3 blocks(DIVUP(n * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ aggregation_forward_cuda_kernel<<>>(n, nsample, c, w_c, input, position, weight, idx, output);
+}
+
+void aggregation_backward_cuda_launcher(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, const float *grad_output, float *grad_input, float *grad_position, float *grad_weight) {
+ // input: grad_output: (n, c), output: grad_input: (n, c), grad_position: (n, nsample, c), grad_weight: (n, nsample, w_c)
+ dim3 blocks(DIVUP(n * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ aggregation_backward_cuda_kernel<<>>(n, nsample, c, w_c, input, position, weight, idx, grad_output, grad_input, grad_position, grad_weight);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..5211a96aa2acbe0d9baf32bddc9ab4be87703072
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/aggregation/aggregation_cuda_kernel.h
@@ -0,0 +1,20 @@
+#ifndef _AGGREGATION_CUDA_KERNEL
+#define _AGGREGATION_CUDA_KERNEL
+#include
+#include
+#include
+
+void aggregation_forward_cuda(int n, int nsample, int c, int w_c, at::Tensor input_tensor, at::Tensor position_tensor, at::Tensor weight_tensor, at::Tensor idx_tensor, at::Tensor output_tensor);
+void aggregation_backward_cuda(int n, int nsample, int c, int w_c, at::Tensor input_tensor, at::Tensor position_tensor, at::Tensor weight_tensor, at::Tensor idx_tensor, at::Tensor grad_output_tensor, at::Tensor grad_input_tensor, at::Tensor grad_position_tensor, at::Tensor grad_weight_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void aggregation_forward_cuda_launcher(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, float *output);
+void aggregation_backward_cuda_launcher(int n, int nsample, int c, int w_c, const float *input, const float *position, const float *weight, const int *idx, const float *grad_output, float *grad_input, float *grad_position, float *grad_weight);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda.cpp b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8d2c725ae0ed70c884a8643aa74ba0c0f6660d30
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda.cpp
@@ -0,0 +1,56 @@
+#include
+#include
+#include
+#include
+#include "attention_cuda_kernel.h"
+
+void attention_step1_forward_cuda(int N, int M, int h, int C, at::Tensor q_tensor, at::Tensor k_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor)
+{
+ const float *q = q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ float *attn = attn_tensor.data_ptr();
+ attention_step1_forward_cuda_launcher(N, M, h, C, q, k, index0, index1, attn);
+}
+
+void attention_step1_backward_cuda(int N, int M, int h, int C, at::Tensor grad_out_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor q_tensor, at::Tensor k_tensor,
+ at::Tensor grad_q_tensor, at::Tensor grad_k_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *q = q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ float *grad_q = grad_q_tensor.data_ptr();
+ float *grad_k = grad_k_tensor.data_ptr();
+ attention_step1_backward_cuda_launcher(N, M, h, C, grad_out, index0, index1, q, k, grad_q, grad_k);
+}
+
+void attention_step2_forward_cuda(int N, int M, int h, int C, at::Tensor attn_tensor, at::Tensor v_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor output_tensor)
+{
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ attention_step2_forward_cuda_launcher(N, M, h, C, attn, v, index0, index1, output);
+}
+
+
+void attention_step2_backward_cuda(int N, int M, int h, int C, at::Tensor grad_out_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor,
+ at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ float *grad_attn = grad_attn_tensor.data_ptr();
+ float *grad_v = grad_v_tensor.data_ptr();
+ attention_step2_backward_cuda_launcher(N, M, h, C, grad_out, index0, index1, attn, v, grad_attn, grad_v);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..f71ad62987233229fcb547b30cfb7b9191683050
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.cu
@@ -0,0 +1,103 @@
+#include "../cuda_utils.h"
+#include "attention_cuda_kernel.h"
+
+
+__global__ void attention_step1_forward_cuda_kernel( // M, h, C//h
+ int N, int M, int h, int C, const float *q, const float *k,
+ const int *index0, const int *index1, float *attn) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx0 = index0[m_idx];
+ int idx1 = index1[m_idx];
+ float val = q[idx0*C+h_idx*C/h+c_idx] * k[idx1*C+h_idx*C/h+c_idx];
+ atomicAdd(attn+m_idx*h+h_idx, val);
+}
+
+__global__ void attention_step1_backward_cuda_kernel( // M, h, C//h
+ int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *q, const float *k,
+ float *grad_q, float *grad_k) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx0 = index0[m_idx];
+ int idx1 = index1[m_idx];
+ int grad_out_idx = m_idx*h+h_idx;
+ int q_idx = idx0*C+h_idx*C/h+c_idx;
+ int k_idx = idx1*C+h_idx*C/h+c_idx;
+ atomicAdd(grad_q+q_idx, grad_out[grad_out_idx] * k[k_idx]);
+ atomicAdd(grad_k+k_idx, grad_out[grad_out_idx] * q[q_idx]);
+}
+
+void attention_step1_forward_cuda_launcher(int N, int M, int h, int C, const float *q, const float *k,
+ const int *index0, const int *index1, float *attn) {
+ // input: attn: (M, h), v: (N, h, C/h), index0: (M, ), index1: (M, )
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step1_forward_cuda_kernel<<>>(N, M, h, C, q, k, index0, index1, attn);
+}
+
+void attention_step1_backward_cuda_launcher(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1,
+ const float *q, const float *k, float *grad_q, float *grad_k) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step1_backward_cuda_kernel<<>>(N, M, h, C, grad_out, index0, index1, q, k, grad_q, grad_k);
+}
+
+__global__ void attention_step2_forward_cuda_kernel( // M, h, C//h
+ int N, int M, int h, int C, const float *attn, const float *v,
+ const int *index0, const int *index1, float *output) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx1 = index1[m_idx];
+ float val = attn[m_idx*h+h_idx] * v[idx1*C+h_idx*C/h+c_idx];
+ int idx0 = index0[m_idx];
+ atomicAdd(output+idx0*C+h_idx*C/h+c_idx, val);
+}
+
+__global__ void attention_step2_backward_cuda_kernel( // M, h, C//h
+ int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v,
+ float *grad_attn, float *grad_v) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx0 = index0[m_idx];
+ int idx1 = index1[m_idx];
+ int grad_out_idx = idx0*C+h_idx*C/h+c_idx;
+ atomicAdd(grad_attn+m_idx*h+h_idx, grad_out[grad_out_idx] * v[idx1*C+h_idx*C/h+c_idx]);
+ atomicAdd(grad_v+idx1*C+h_idx*C/h+c_idx, grad_out[grad_out_idx] * attn[m_idx*h+h_idx]);
+}
+
+void attention_step2_forward_cuda_launcher(int N, int M, int h, int C, const float *attn, const float *v,
+ const int *index0, const int *index1, float *output) {
+ // input: attn: (M, h), v: (N, h, C/h), index0: (M, ), index1: (M, )
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_forward_cuda_kernel<<>>(N, M, h, C, attn, v, index0, index1, output);
+}
+
+void attention_step2_backward_cuda_launcher(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1,
+ const float *attn, const float *v, float *grad_attn, float *grad_v) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_backward_cuda_kernel<<>>(N, M, h, C, grad_out, index0, index1, attn, v, grad_attn, grad_v);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..cbd99b9b6a9c65af76aa95d00fff6306446114cd
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention/attention_cuda_kernel.h
@@ -0,0 +1,26 @@
+#ifndef _ATTENTION_CUDA_KERNEL
+#define _ATTENTION_CUDA_KERNEL
+#include
+#include
+#include
+
+void attention_step1_forward_cuda(int N, int M, int h, int C, at::Tensor q_tensor, at::Tensor k_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor);
+void attention_step1_backward_cuda(int N, int M, int h, int C, at::Tensor grad_out_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor q_tensor, at::Tensor k_tensor, at::Tensor grad_q_tensor, at::Tensor grad_k_tensor);
+
+void attention_step2_forward_cuda(int N, int M, int h, int C, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor output_tensor);
+void attention_step2_backward_cuda(int N, int M, int h, int C, at::Tensor grad_out_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void attention_step1_forward_cuda_launcher(int N, int M, int h, int C, const float *q, const float *k, const int *index0, const int *index1, float *attn);
+void attention_step1_backward_cuda_launcher(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *q, const float *k, float *grad_q, float *grad_k);
+
+void attention_step2_forward_cuda_launcher(int N, int M, int h, int C, const float *attn, const float *v, const int *index0, const int *index1, float *output);
+void attention_step2_backward_cuda_launcher(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v, float *grad_attn, float *grad_v);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.cu b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.cu
new file mode 100644
index 0000000000000000000000000000000000000000..2e5343f5a3a0ad52aae7d06d22989f04390b68f6
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.cu
@@ -0,0 +1,193 @@
+#include "../cuda_utils.h"
+#include "attention_cuda_kernel_v2.h"
+
+
+template
+__global__ void attention_step1_forward_cuda_kernel_v2( // M, h, C//h
+ int N, int M, int h, const float *q, const float *k,
+ const int *index0_offsets, const int *index1, float *attn) {
+
+ int h_idx = blockIdx.y;
+ int q_idx = blockIdx.x;
+ int n_idx = threadIdx.x;
+ int C = h * d;
+ // if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ __shared__ float query_vec[d];
+ __shared__ int start, end;
+
+ // if(n_idx == 0){
+ // printf("blockDim.x: %d\n", blockDim.x);
+ // }
+
+ if (n_idx == 0){
+ start = index0_offsets[q_idx];
+ end = index0_offsets[q_idx+1];
+ // printf("start: %d, end: %d, blockDim.x: %d\n", start, end, blockDim.x);
+ }
+ for(int i = n_idx; i < d; i += blockDim.x)
+ query_vec[i] = q[q_idx*C + h_idx*d + i];
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+ if(m_idx >= end)
+ return;
+
+ float sum = 0;
+ for(int i = 0; i < d; i++){
+ int k_idx = index1[m_idx];
+ float key = k[k_idx * C + h_idx * d + i];
+ sum += query_vec[i] * key;
+ }
+ attn[m_idx*h + h_idx] = sum;
+ // int idx0 = index0[m_idx];
+ // int idx1 = index1[m_idx];
+ // float val = q[idx0*C+h_idx*C/h+c_idx] * k[idx1*C+h_idx*C/h+c_idx];
+ // atomicAdd(attn+m_idx*h+h_idx, val);
+}
+
+template
+__global__ void attention_step1_backward_cuda_kernel_v2( // M, h, C//h
+ int N, int M, int h, const float *grad_out, const int *index0_offsets, const int *index1, const float *q, const float *k,
+ float *grad_q, float *grad_k) {
+
+ int h_idx = blockIdx.y;
+ int q_idx = blockIdx.x;
+ int n_idx = threadIdx.x;
+ int C = d * h;
+
+ __shared__ float query_vec[d];
+ __shared__ int start, end;
+
+ if (n_idx == 0){
+ start = index0_offsets[q_idx];
+ end = index0_offsets[q_idx+1];
+ }
+ for(int i = n_idx; i < d; i += blockDim.x)
+ query_vec[i] = q[q_idx*C + h_idx*d + i];
+
+ __shared__ float gradient_new[d];
+ for(int i = n_idx; i < d; i += blockDim.x)
+ gradient_new[i] = 0;
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+ if(m_idx < end){
+ float gradient = grad_out[m_idx*h + h_idx];
+ for(int i = 0; i < d; i++){
+ int k_idx = index1[m_idx];
+ atomicAdd(&gradient_new[i], gradient * k[k_idx*C + h_idx*d + i]);
+ atomicAdd(grad_k + k_idx*C + h_idx*d + i, gradient * query_vec[i]);
+ }
+ }
+ __syncthreads();
+
+ for(int i = n_idx; i < d; i += blockDim.x)
+ grad_q[q_idx*C + h_idx*d + i] = gradient_new[i];
+}
+
+void attention_step1_forward_cuda_launcher_v2(int N, int M, int h, int C, const unsigned int n_max,
+ const float *q, const float *k, const int *index0_offsets, const int *index1, float *attn) {
+ // input: attn: (M, h), v: (N, h, C/h), index0: (M, ), index1: (M, )
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(N, h);
+ unsigned int n_threads = opt_n_threads(n_max);
+
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+ // n_threads = n_threads > 1024 ? 512 : n_threads;
+
+ // printf("n_max: %d, n_threads: %d\n", n_max, n_threads);
+
+ // dim3 threads(THREADS_PER_BLOCK);
+ // attention_step1_forward_cuda_kernel_v2<<>>(N, M, h, C, q, k, index0, index1, attn);
+
+ switch (C / h) {
+ case 16:
+ attention_step1_forward_cuda_kernel_v2<16><<>>(N, M, h, q, k, index0_offsets, index1, attn);
+ break;
+ case 32:
+ attention_step1_forward_cuda_kernel_v2<32><<>>(N, M, h, q, k, index0_offsets, index1, attn);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+void attention_step1_backward_cuda_launcher_v2(int N, int M, int h, int C, const unsigned int n_max,
+ const float *grad_out, const int *index0_offsets, const int *index1, const float *q, const float *k, float *grad_q, float *grad_k) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ // dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ // dim3 threads(THREADS_PER_BLOCK);
+ dim3 blocks(N, h);
+ unsigned int n_threads = opt_n_threads(n_max);
+ // attention_step1_backward_cuda_kernel_v2<<>>(N, M, h, C/h, grad_out, index0_offsets, index1, q, k, grad_q, grad_k);
+
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+ // n_threads = n_threads > 1024 ? 512 : n_threads;
+
+ // printf("n_max: %d, n_threads: %d\n", n_max, n_threads);
+
+ switch (C / h) {
+ case 16:
+ attention_step1_backward_cuda_kernel_v2<16><<>>(N, M, h, grad_out, index0_offsets, index1, q, k, grad_q, grad_k);
+ break;
+ case 32:
+ attention_step1_backward_cuda_kernel_v2<32><<>>(N, M, h, grad_out, index0_offsets, index1, q, k, grad_q, grad_k);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+
+}
+
+__global__ void attention_step2_forward_cuda_kernel_v2( // M, h, C//h
+ int N, int M, int h, int C, const float *attn, const float *v,
+ const int *index0, const int *index1, float *output) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx1 = index1[m_idx];
+ float val = attn[m_idx*h+h_idx] * v[idx1*C+h_idx*C/h+c_idx];
+ int idx0 = index0[m_idx];
+ atomicAdd(output+idx0*C+h_idx*C/h+c_idx, val);
+}
+
+__global__ void attention_step2_backward_cuda_kernel_v2( // M, h, C//h
+ int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v,
+ float *grad_attn, float *grad_v) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int m_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (m_idx >= M || h_idx >= h || c_idx >= C / h) return;
+
+ int idx0 = index0[m_idx];
+ int idx1 = index1[m_idx];
+ int grad_out_idx = idx0*C+h_idx*C/h+c_idx;
+ atomicAdd(grad_attn+m_idx*h+h_idx, grad_out[grad_out_idx] * v[idx1*C+h_idx*C/h+c_idx]);
+ atomicAdd(grad_v+idx1*C+h_idx*C/h+c_idx, grad_out[grad_out_idx] * attn[m_idx*h+h_idx]);
+}
+
+void attention_step2_forward_cuda_launcher_v2(int N, int M, int h, int C, const float *attn, const float *v,
+ const int *index0, const int *index1, float *output) {
+ // input: attn: (M, h), v: (N, h, C/h), index0: (M, ), index1: (M, )
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_forward_cuda_kernel_v2<<>>(N, M, h, C, attn, v, index0, index1, output);
+}
+
+void attention_step2_backward_cuda_launcher_v2(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1,
+ const float *attn, const float *v, float *grad_attn, float *grad_v) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(C/h, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M, THREADS_PER_BLOCK), h, C/h);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_backward_cuda_kernel_v2<<>>(N, M, h, C, grad_out, index0, index1, attn, v, grad_attn, grad_v);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.h b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.h
new file mode 100644
index 0000000000000000000000000000000000000000..d7e7f047bc318928ddb9402acbcdf20204596450
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_kernel_v2.h
@@ -0,0 +1,26 @@
+#ifndef _ATTENTION_V2_CUDA_KERNEL
+#define _ATTENTION_V2_CUDA_KERNEL
+#include
+#include
+#include
+
+void attention_step1_forward_cuda_v2(int N, int M, int h, int C, const unsigned int n_max, at::Tensor q_tensor, at::Tensor k_tensor, at::Tensor index0_tensor_offsets, at::Tensor index1_tensor, at::Tensor attn_tensor);
+void attention_step1_backward_cuda_v2(int N, int M, int h, int C, const unsigned int n_max, at::Tensor grad_out_tensor, at::Tensor index0_tensor_offsets, at::Tensor index1_tensor, at::Tensor q_tensor, at::Tensor k_tensor, at::Tensor grad_q_tensor, at::Tensor grad_k_tensor);
+
+void attention_step2_forward_cuda_v2(int N, int M, int h, int C, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor output_tensor);
+void attention_step2_backward_cuda_v2(int N, int M, int h, int C, at::Tensor grad_out_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void attention_step1_forward_cuda_launcher_v2(int N, int M, int h, int C, const unsigned int n_max, const float *q, const float *k, const int *index0_offsets, const int *index1, float *attn);
+void attention_step1_backward_cuda_launcher_v2(int N, int M, int h, int C, const unsigned int n_max, const float *grad_out, const int *index0_offsets, const int *index1, const float *q, const float *k, float *grad_q, float *grad_k);
+
+void attention_step2_forward_cuda_launcher_v2(int N, int M, int h, int C, const float *attn, const float *v, const int *index0, const int *index1, float *output);
+void attention_step2_backward_cuda_launcher_v2(int N, int M, int h, int C, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v, float *grad_attn, float *grad_v);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_v2.cpp b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_v2.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..311adaf223928f83f3f238268fe0f189b5479657
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/attention_v2/attention_cuda_v2.cpp
@@ -0,0 +1,56 @@
+#include
+#include
+#include
+#include
+#include "attention_cuda_kernel_v2.h"
+
+void attention_step1_forward_cuda_v2(int N, int M, int h, int C, const unsigned int n_max, at::Tensor q_tensor, at::Tensor k_tensor,
+ at::Tensor index0_tensor_offsets, at::Tensor index1_tensor, at::Tensor attn_tensor)
+{
+ const float *q = q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index0_offsets = index0_tensor_offsets.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ float *attn = attn_tensor.data_ptr();
+ attention_step1_forward_cuda_launcher_v2(N, M, h, C, n_max, q, k, index0_offsets, index1, attn);
+}
+
+void attention_step1_backward_cuda_v2(int N, int M, int h, int C, const unsigned int n_max, at::Tensor grad_out_tensor,
+ at::Tensor index0_tensor_offsets, at::Tensor index1_tensor, at::Tensor q_tensor, at::Tensor k_tensor,
+ at::Tensor grad_q_tensor, at::Tensor grad_k_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const int *index0_offsets = index0_tensor_offsets.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *q = q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ float *grad_q = grad_q_tensor.data_ptr();
+ float *grad_k = grad_k_tensor.data_ptr();
+ attention_step1_backward_cuda_launcher_v2(N, M, h, C, n_max, grad_out, index0_offsets, index1, q, k, grad_q, grad_k);
+}
+
+void attention_step2_forward_cuda_v2(int N, int M, int h, int C, at::Tensor attn_tensor, at::Tensor v_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor output_tensor)
+{
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ attention_step2_forward_cuda_launcher_v2(N, M, h, C, attn, v, index0, index1, output);
+}
+
+
+void attention_step2_backward_cuda_v2(int N, int M, int h, int C, at::Tensor grad_out_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor,
+ at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ float *grad_attn = grad_attn_tensor.data_ptr();
+ float *grad_v = grad_v_tensor.data_ptr();
+ attention_step2_backward_cuda_launcher_v2(N, M, h, C, grad_out, index0, index1, attn, v, grad_attn, grad_v);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/cuda_utils.h b/models/Mask3D/mask3d/utils/pointops2/src/cuda_utils.h
new file mode 100644
index 0000000000000000000000000000000000000000..e67749c4f5f8964ffb5916c13f5260cf8df45f52
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/cuda_utils.h
@@ -0,0 +1,23 @@
+#ifndef _CUDA_UTILS_H
+#define _CUDA_UTILS_H
+
+#include
+#include
+
+#define TOTAL_THREADS 1024
+#define THREADS_PER_BLOCK 256
+#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0))
+
+inline int opt_n_threads(int work_size) {
+ const int pow_2 = std::log(static_cast(work_size)) / std::log(2.0);
+ return std::max(std::min(1 << pow_2, TOTAL_THREADS), 1);
+}
+
+inline dim3 opt_block_config(int x, int y) {
+ const int x_threads = opt_n_threads(x);
+ const int y_threads = std::max(std::min(opt_n_threads(y), TOTAL_THREADS / x_threads), 1);
+ dim3 block_config(x_threads, y_threads, 1);
+ return block_config;
+}
+
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda.cpp b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a00d3139db5a3b58261c825c4a9e46e168fea8ce
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda.cpp
@@ -0,0 +1,22 @@
+#include
+#include
+#include
+#include
+#include "grouping_cuda_kernel.h"
+
+
+void grouping_forward_cuda(int m, int nsample, int c, at::Tensor input_tensor, at::Tensor idx_tensor, at::Tensor output_tensor)
+{
+ const float *input = input_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ grouping_forward_cuda_launcher(m, nsample, c, input, idx, output);
+}
+
+void grouping_backward_cuda(int m, int nsample, int c, at::Tensor grad_output_tensor, at::Tensor idx_tensor, at::Tensor grad_input_tensor)
+{
+ const float *grad_output = grad_output_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ float *grad_input = grad_input_tensor.data_ptr();
+ grouping_backward_cuda_launcher(m, nsample, c, grad_output, idx, grad_input);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..58ec0a21a2949f9f82504ccd24597c544c50af40
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.cu
@@ -0,0 +1,40 @@
+#include "../cuda_utils.h"
+#include "grouping_cuda_kernel.h"
+
+
+__global__ void grouping_forward_cuda_kernel(int m, int nsample, int c, const float *__restrict__ input, const int *__restrict__ idx, float *__restrict__ output) {
+ // input: input: (n, c), idx: (m, nsample), output: (m, nsample, c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= m * nsample * c) return;
+ const int c_idx = index % c;
+ const int nsample_idx = (index / c) % nsample;
+ const int m_idx = index / nsample / c;
+ const int input_idx = idx[m_idx * nsample + nsample_idx] * c + c_idx;
+ output[index] = input[input_idx];
+}
+
+__global__ void grouping_backward_cuda_kernel(int m, int nsample, int c, const float *__restrict__ grad_output, const int *__restrict__ idx, float *__restrict__ grad_input) {
+ // input: grad_output: (m, nsample, c), idx: (m, nsample), output: grad_input: (n, c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= m * nsample * c) return;
+ const int c_idx = index % c;
+ const int nsample_idx = (index / c) % nsample;
+ const int m_idx = index / nsample / c;
+ const int input_idx = idx[m_idx * nsample + nsample_idx] * c + c_idx;
+ atomicAdd(grad_input + input_idx, grad_output[index]);
+}
+
+void grouping_forward_cuda_launcher(int m, int nsample, int c, const float *input, const int *idx, float *output) {
+ // input: input: (n, c), idx: (m, nsample), output: (m, nsample, c)
+ dim3 blocks(DIVUP(m * nsample * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ grouping_forward_cuda_kernel<<>>(m, nsample, c, input, idx, output);
+}
+
+void grouping_backward_cuda_launcher(int m, int nsample, int c, const float *grad_output, const int *idx, float *grad_input)
+{
+ // input: grad_output: (m, nsample, c), idx: (m, nsample), output: grad_input: (n, c)
+ dim3 blocks(DIVUP(m * nsample * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ grouping_backward_cuda_kernel<<>>(m, nsample, c, grad_output, idx, grad_input);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..3db4aaa9fad5811d559d47c500e4b00f0165d9b4
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/grouping/grouping_cuda_kernel.h
@@ -0,0 +1,20 @@
+#ifndef _GROUPING_CUDA_KERNEL
+#define _GROUPING_CUDA_KERNEL
+#include
+#include
+#include
+
+void grouping_forward_cuda(int m, int nsample, int c, at::Tensor input_tensor, at::Tensor idx_tensor, at::Tensor output_tensor);
+void grouping_backward_cuda(int m, int nsample, int c, at::Tensor grad_output_tensor, at::Tensor idx_tensor, at::Tensor grad_input_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void grouping_forward_cuda_launcher(int m, int nsample, int c, const float *input, const int *idx, float *output);
+void grouping_backward_cuda_launcher(int m, int nsample, int c, const float *grad_output, const int *idx, float *grad_input);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda.cpp b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..a73c02b1193330af8e0bc66093749126561700b3
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda.cpp
@@ -0,0 +1,24 @@
+#include
+#include
+#include
+#include
+#include "interpolation_cuda_kernel.h"
+
+
+void interpolation_forward_cuda(int n, int c, int k, at::Tensor input_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor output_tensor)
+{
+ const float *input = input_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ const float *weight = weight_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ interpolation_forward_cuda_launcher(n, c, k, input, idx, weight, output);
+}
+
+void interpolation_backward_cuda(int n, int c, int k, at::Tensor grad_output_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor grad_input_tensor)
+{
+ const float *grad_output = grad_output_tensor.data_ptr();
+ const int *idx = idx_tensor.data_ptr();
+ const float *weight = weight_tensor.data_ptr();
+ float *grad_input = grad_input_tensor.data_ptr();
+ interpolation_backward_cuda_launcher(n, c, k, grad_output, idx, weight, grad_input);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..f560d8c92c6eac865b8c1e1dc27140fe3fcc2250
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.cu
@@ -0,0 +1,47 @@
+#include "../cuda_utils.h"
+#include "interpolation_cuda_kernel.h"
+
+
+__global__ void interpolation_forward_cuda_kernel(int n, int c, int k, const float *input, const int *idx, const float *weight, float *output)
+{
+ // input: input: (m, c), idx: (n, k), weight: (n, k), output: output (n, c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= n * c) return;
+ int c_idx = index % c;
+ int n_idx = index / c;
+ for (int i = 0; i < k; i++)
+ {
+ int idx_idx = n_idx * k + i;
+ int input_idx = idx[idx_idx] * c + c_idx;
+ output[index] += input[input_idx] * weight[idx_idx];
+ }
+}
+
+__global__ void interpolation_backward_cuda_kernel(int n, int c, int k, const float *grad_output, const int *idx, const float *weight, float *grad_input)
+{
+ // input: grad_output: (n, c), idx: (n, k), weight: (n, k), output: grad_input (m, c)
+ int index = blockIdx.x * blockDim.x + threadIdx.x;
+ if (index >= n * c) return;
+ int c_idx = index % c;
+ int n_idx = index / c;
+ for (int i = 0; i < k; i++)
+ {
+ int idx_idx = n_idx * k + i;
+ int input_idx = idx[idx_idx] * c + c_idx;
+ atomicAdd(grad_input + input_idx, grad_output[index] * weight[idx_idx]);
+ }
+}
+
+void interpolation_forward_cuda_launcher(int n, int c, int k, const float *input, const int *idx, const float *weight, float *output) {
+ // input: input: (m, c), idx: (n, k), weight: (n, k), output: output (n, c)
+ dim3 blocks(DIVUP(n * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ interpolation_forward_cuda_kernel<<>>(n, c, k, input, idx, weight, output);
+}
+
+void interpolation_backward_cuda_launcher(int n, int c, int k, const float *grad_output, const int *idx, const float *weight, float *grad_input) {
+ // input: grad_output: (n, c), idx: (n, k), weight: (n, k), output: grad_input (m, c)
+ dim3 blocks(DIVUP(n * c, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ interpolation_backward_cuda_kernel<<>>(n, c, k, grad_output, idx, weight, grad_input);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..309e5dd0a34ccb58807bbf32389ba65e7ee6961b
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/interpolation/interpolation_cuda_kernel.h
@@ -0,0 +1,20 @@
+#ifndef _INTERPOLATION_CUDA_KERNEL
+#define _INTERPOLATION_CUDA_KERNEL
+#include
+#include
+#include
+
+void interpolation_forward_cuda(int n, int c, int k, at::Tensor input_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor output_tensor);
+void interpolation_backward_cuda(int n, int c, int k, at::Tensor grad_output_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor grad_input_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void interpolation_forward_cuda_launcher(int n, int c, int k, const float *input, const int *idx, const float *weight, float *output);
+void interpolation_backward_cuda_launcher(int n, int c, int k, const float *grad_output, const int *idx, const float *weight, float *grad_input);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda.cpp b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..568f1366f65dda9f57f037212a46d2552806e79f
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda.cpp
@@ -0,0 +1,17 @@
+#include
+#include
+#include
+#include
+#include "knnquery_cuda_kernel.h"
+
+
+void knnquery_cuda(int m, int nsample, at::Tensor xyz_tensor, at::Tensor new_xyz_tensor, at::Tensor offset_tensor, at::Tensor new_offset_tensor, at::Tensor idx_tensor, at::Tensor dist2_tensor)
+{
+ const float *xyz = xyz_tensor.data_ptr();
+ const float *new_xyz = new_xyz_tensor.data_ptr();
+ const int *offset = offset_tensor.data_ptr();
+ const int *new_offset = new_offset_tensor.data_ptr();
+ int *idx = idx_tensor.data_ptr();
+ float *dist2 = dist2_tensor.data_ptr();
+ knnquery_cuda_launcher(m, nsample, xyz, new_xyz, offset, new_offset, idx, dist2);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..83762bc0110e38c7b5fa8adf0ef4ce255bc9d0b9
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.cu
@@ -0,0 +1,116 @@
+#include "../cuda_utils.h"
+#include "knnquery_cuda_kernel.h"
+
+
+__device__ void swap_float(float *x, float *y)
+{
+ float tmp = *x;
+ *x = *y;
+ *y = tmp;
+}
+
+
+__device__ void swap_int(int *x, int *y)
+{
+ int tmp = *x;
+ *x = *y;
+ *y = tmp;
+}
+
+
+__device__ void reheap(float *dist, int *idx, int k)
+{
+ int root = 0;
+ int child = root * 2 + 1;
+ while (child < k)
+ {
+ if(child + 1 < k && dist[child+1] > dist[child])
+ child++;
+ if(dist[root] > dist[child])
+ return;
+ swap_float(&dist[root], &dist[child]);
+ swap_int(&idx[root], &idx[child]);
+ root = child;
+ child = root * 2 + 1;
+ }
+}
+
+
+__device__ void heap_sort(float *dist, int *idx, int k)
+{
+ int i;
+ for (i = k - 1; i > 0; i--)
+ {
+ swap_float(&dist[0], &dist[i]);
+ swap_int(&idx[0], &idx[i]);
+ reheap(dist, idx, i);
+ }
+}
+
+
+__device__ int get_bt_idx(int idx, const int *offset)
+{
+ int i = 0;
+ while (1)
+ {
+ if (idx < offset[i])
+ break;
+ else
+ i++;
+ }
+ return i;
+}
+
+
+__global__ void knnquery_cuda_kernel(int m, int nsample, const float *__restrict__ xyz, const float *__restrict__ new_xyz, const int *__restrict__ offset, const int *__restrict__ new_offset, int *__restrict__ idx, float *__restrict__ dist2) {
+ // input: xyz (n, 3) new_xyz (m, 3)
+ // output: idx (m, nsample) dist2 (m, nsample)
+ int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (pt_idx >= m) return;
+
+ new_xyz += pt_idx * 3;
+ idx += pt_idx * nsample;
+ dist2 += pt_idx * nsample;
+ int bt_idx = get_bt_idx(pt_idx, new_offset);
+ int start;
+ if (bt_idx == 0)
+ start = 0;
+ else
+ start = offset[bt_idx - 1];
+ int end = offset[bt_idx];
+
+ float new_x = new_xyz[0];
+ float new_y = new_xyz[1];
+ float new_z = new_xyz[2];
+
+ float best_dist[100];
+ int best_idx[100];
+ for(int i = 0; i < nsample; i++){
+ best_dist[i] = 1e10;
+ best_idx[i] = start;
+ }
+ for(int i = start; i < end; i++){
+ float x = xyz[i * 3 + 0];
+ float y = xyz[i * 3 + 1];
+ float z = xyz[i * 3 + 2];
+ float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);
+ if (d2 < best_dist[0]){
+ best_dist[0] = d2;
+ best_idx[0] = i;
+ reheap(best_dist, best_idx, nsample);
+ }
+ }
+ heap_sort(best_dist, best_idx, nsample);
+ for(int i = 0; i < nsample; i++){
+ idx[i] = best_idx[i];
+ dist2[i] = best_dist[i];
+ }
+}
+
+
+void knnquery_cuda_launcher(int m, int nsample, const float *xyz, const float *new_xyz, const int *offset, const int *new_offset, int *idx, float *dist2) {
+ // input: new_xyz: (m, 3), xyz: (n, 3), idx: (m, nsample)
+ dim3 blocks(DIVUP(m, THREADS_PER_BLOCK));
+ dim3 threads(THREADS_PER_BLOCK);
+ knnquery_cuda_kernel<<>>(m, nsample, xyz, new_xyz, offset, new_offset, idx, dist2);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..3c0aedfe8fbe6c427ee15bb550c2c1829e9f4b97
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/knnquery/knnquery_cuda_kernel.h
@@ -0,0 +1,18 @@
+#ifndef _KNNQUERY_CUDA_KERNEL
+#define _KNNQUERY_CUDA_KERNEL
+#include
+#include
+#include
+
+void knnquery_cuda(int m, int nsample, at::Tensor xyz_tensor, at::Tensor new_xyz_tensor, at::Tensor offset_tensor, at::Tensor new_offset_tensor, at::Tensor idx_tensor, at::Tensor dist2_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void knnquery_cuda_launcher(int m, int nsample, const float *xyz, const float *new_xyz, const int *offset, const int *new_offset, int *idx, float *dist2);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/pointops_api.cpp b/models/Mask3D/mask3d/utils/pointops2/src/pointops_api.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..812789f7d4fdf961b960641ba6c2fd660c16a654
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/pointops_api.cpp
@@ -0,0 +1,45 @@
+#include
+#include
+
+#include "knnquery/knnquery_cuda_kernel.h"
+#include "sampling/sampling_cuda_kernel.h"
+#include "grouping/grouping_cuda_kernel.h"
+#include "interpolation/interpolation_cuda_kernel.h"
+#include "aggregation/aggregation_cuda_kernel.h"
+#include "subtraction/subtraction_cuda_kernel.h"
+#include "attention/attention_cuda_kernel.h"
+#include "rpe/relative_pos_encoding_cuda_kernel.h"
+#include "attention_v2/attention_cuda_kernel_v2.h"
+#include "rpe_v2/relative_pos_encoding_cuda_kernel_v2.h"
+
+
+PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
+ m.def("knnquery_cuda", &knnquery_cuda, "knnquery_cuda");
+ m.def("furthestsampling_cuda", &furthestsampling_cuda, "furthestsampling_cuda");
+ m.def("grouping_forward_cuda", &grouping_forward_cuda, "grouping_forward_cuda");
+ m.def("grouping_backward_cuda", &grouping_backward_cuda, "grouping_backward_cuda");
+ m.def("interpolation_forward_cuda", &interpolation_forward_cuda, "interpolation_forward_cuda");
+ m.def("interpolation_backward_cuda", &interpolation_backward_cuda, "interpolation_backward_cuda");
+ m.def("subtraction_forward_cuda", &subtraction_forward_cuda, "subtraction_forward_cuda");
+ m.def("subtraction_backward_cuda", &subtraction_backward_cuda, "subtraction_backward_cuda");
+ m.def("aggregation_forward_cuda", &aggregation_forward_cuda, "aggregation_forward_cuda");
+ m.def("aggregation_backward_cuda", &aggregation_backward_cuda, "aggregation_backward_cuda");
+ m.def("attention_step1_forward_cuda", &attention_step1_forward_cuda, "attention_step1_forward_cuda");
+ m.def("attention_step1_backward_cuda", &attention_step1_backward_cuda, "attention_step1_backward_cuda");
+ m.def("attention_step2_forward_cuda", &attention_step2_forward_cuda, "attention_step2_forward_cuda");
+ m.def("attention_step2_backward_cuda", &attention_step2_backward_cuda, "attention_step2_backward_cuda");
+ m.def("dot_prod_with_idx_forward_cuda", &dot_prod_with_idx_forward_cuda, "dot_prod_with_idx_forward_cuda");
+ m.def("dot_prod_with_idx_backward_cuda", &dot_prod_with_idx_backward_cuda, "dot_prod_with_idx_backward_cuda");
+ m.def("attention_step2_with_rel_pos_value_forward_cuda", &attention_step2_with_rel_pos_value_forward_cuda, "attention_step2_with_rel_pos_value_forward_cuda");
+ m.def("attention_step2_with_rel_pos_value_backward_cuda", &attention_step2_with_rel_pos_value_backward_cuda, "attention_step2_with_rel_pos_value_backward_cuda");
+ m.def("attention_step1_forward_cuda_v2", &attention_step1_forward_cuda_v2, "attention_step1_forward_cuda_v2");
+ m.def("attention_step1_backward_cuda_v2", &attention_step1_backward_cuda_v2, "attention_step1_backward_cuda_v2");
+ m.def("attention_step2_forward_cuda_v2", &attention_step2_forward_cuda_v2, "attention_step2_forward_cuda_v2");
+ m.def("attention_step2_backward_cuda_v2", &attention_step2_backward_cuda_v2, "attention_step2_backward_cuda_v2");
+ m.def("dot_prod_with_idx_forward_cuda_v2", &dot_prod_with_idx_forward_cuda_v2, "dot_prod_with_idx_forward_cuda_v2");
+ m.def("dot_prod_with_idx_backward_cuda_v2", &dot_prod_with_idx_backward_cuda_v2, "dot_prod_with_idx_backward_cuda_v2");
+ m.def("attention_step2_with_rel_pos_value_forward_cuda_v2", &attention_step2_with_rel_pos_value_forward_cuda_v2, "attention_step2_with_rel_pos_value_forward_cuda_v2");
+ m.def("attention_step2_with_rel_pos_value_backward_cuda_v2", &attention_step2_with_rel_pos_value_backward_cuda_v2, "attention_step2_with_rel_pos_value_backward_cuda_v2");
+ m.def("dot_prod_with_idx_forward_cuda_v3", &dot_prod_with_idx_forward_cuda_v3, "dot_prod_with_idx_forward_cuda_v3");
+ m.def("dot_prod_with_idx_backward_cuda_v3", &dot_prod_with_idx_backward_cuda_v3, "dot_prod_with_idx_backward_cuda_v3");
+ }
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda.cpp b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..634ebb07520a0bd6fbcdf856679cc908eb2bec40
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda.cpp
@@ -0,0 +1,60 @@
+#include
+#include
+#include
+#include
+#include "relative_pos_encoding_cuda_kernel.h"
+
+void dot_prod_with_idx_forward_cuda(int N, int M, int h, int hdim, at::Tensor q_tensor, at::Tensor index_tensor,
+ at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor)
+{
+ const float *q = q_tensor.data_ptr();
+ const float *table = table_tensor.data_ptr();
+ const int *index = index_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ dot_prod_with_idx_forward_cuda_launcher(N, M, h, hdim, q, index, table, rel_idx, output);
+}
+
+void dot_prod_with_idx_backward_cuda(int N, int M, int h, int hdim, at::Tensor grad_out_tensor,
+ at::Tensor q_tensor, at::Tensor index_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor,
+ at::Tensor grad_q_tensor, at::Tensor grad_table_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const float *q = q_tensor.data_ptr();
+ const int *index = index_tensor.data_ptr();
+ const float *table = table_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ float *grad_q = grad_q_tensor.data_ptr();
+ float *grad_table = grad_table_tensor.data_ptr();
+ dot_prod_with_idx_backward_cuda_launcher(N, M, h, hdim, grad_out, q, index, table, rel_idx, grad_q, grad_table);
+}
+
+void attention_step2_with_rel_pos_value_forward_cuda(int N, int M, int h, int hdim, at::Tensor attn_tensor, at::Tensor v_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor)
+{
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *table = table_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ attention_step2_with_rel_pos_value_forward_cuda_launcher(N, M, h, hdim, attn, v, index0, index1, table, rel_idx, output);
+}
+
+void attention_step2_with_rel_pos_value_backward_cuda(int N, int M, int h, int hdim, at::Tensor grad_out_tensor,
+ at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor table_tensor,
+ at::Tensor rel_idx_tensor, at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor, at::Tensor grad_table_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const int *index0 = index0_tensor.data_ptr();
+ const int *index1 = index1_tensor.data_ptr();
+ const float *attn = attn_tensor.data_ptr();
+ const float *v = v_tensor.data_ptr();
+ const float *table = table_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ float *grad_attn = grad_attn_tensor.data_ptr();
+ float *grad_v = grad_v_tensor.data_ptr();
+ float *grad_table = grad_table_tensor.data_ptr();
+ attention_step2_with_rel_pos_value_backward_cuda_launcher(N, M, h, hdim, grad_out, index0, index1, attn, v, table, rel_idx, grad_attn, grad_v, grad_table);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.cu b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.cu
new file mode 100644
index 0000000000000000000000000000000000000000..b8fd8f42116ae0487c741c9b856c10c491f215f9
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.cu
@@ -0,0 +1,134 @@
+#include "../cuda_utils.h"
+#include "relative_pos_encoding_cuda_kernel.h"
+
+
+__global__ void dot_prod_with_idx_forward_cuda_kernel( // M, h, hdim
+ int N, int M, int h, int hdim, const float *q, const int *index,
+ const float *table, const int *rel_idx, float *output) {
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3), output: (M, h)
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int thread_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (thread_idx >= M*3 || h_idx >= h || c_idx >= hdim) return;
+
+ int dim = thread_idx % 3;
+ int m_idx = thread_idx / 3;
+
+ int q_idx = index[m_idx];
+ int rel_idx_dim = rel_idx[thread_idx];
+ float rel_table_val = table[rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim];
+ float val = q[q_idx*h*hdim+h_idx*hdim+c_idx] * rel_table_val;
+ atomicAdd(output+m_idx*h+h_idx, val);
+}
+
+__global__ void dot_prod_with_idx_backward_cuda_kernel( // M, h, hdim
+ int N, int M, int h, int hdim, const float *grad_out, const float *q, const int *index,
+ const float *table, const int *rel_idx, float *grad_q, float *grad_table) {
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int thread_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (thread_idx >= M*3 || h_idx >= h || c_idx >= hdim) return;
+
+ int dim = thread_idx % 3;
+ int m_idx = thread_idx / 3;
+
+ int q_idx = index[m_idx];
+ int rel_idx_dim = rel_idx[thread_idx];
+ int grad_out_idx = m_idx*h+h_idx;
+ float grad_out_value = grad_out[grad_out_idx];
+
+ float rel_table_val = table[rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim];
+ atomicAdd(grad_q+q_idx*h*hdim+h_idx*hdim+c_idx, grad_out_value * rel_table_val);
+
+ float q_value = q[q_idx*h*hdim+h_idx*hdim+c_idx];
+ atomicAdd(grad_table+rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim, grad_out_value * q_value);
+}
+
+void dot_prod_with_idx_forward_cuda_launcher(int N, int M, int h, int hdim, const float *q, const int *index,
+ const float *table, const int *rel_idx, float *output) {
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ dim3 threads(THREADS_PER_BLOCK);
+ dot_prod_with_idx_forward_cuda_kernel<<>>(N, M, h, hdim, q, index, table, rel_idx, output);
+}
+
+void dot_prod_with_idx_backward_cuda_launcher(int N, int M, int h, int hdim, const float *grad_out,
+ const float *q, const int *index, const float *table, const int *rel_idx, float *grad_q, float *grad_table) {
+ // input: grad_out: (M, h), output: grad_q: (N, h, hdim), grad_table: (L, h, hdim, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ dim3 threads(THREADS_PER_BLOCK);
+ dot_prod_with_idx_backward_cuda_kernel<<>>(N, M, h, hdim, grad_out, q, index, table, rel_idx, grad_q, grad_table);
+}
+
+__global__ void attention_step2_with_rel_pos_value_forward_cuda_kernel( // M, h, hdim
+ int N, int M, int h, int hdim, const float *attn, const float *v,
+ const int *index0, const int *index1, const float *table, const int *rel_idx, float *output) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int thread_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (thread_idx >= M*3 || h_idx >= h || c_idx >= hdim) return;
+
+ int dim = thread_idx % 3;
+ int m_idx = thread_idx / 3;
+
+ int idx1 = index1[m_idx];
+
+ int rel_idx_dim = rel_idx[thread_idx];
+ float table_val = table[rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim];
+
+ float val = attn[m_idx*h+h_idx] * (v[idx1*h*hdim+h_idx*hdim+c_idx] / 3.0 + table_val);
+
+ int idx0 = index0[m_idx];
+ atomicAdd(output+idx0*h*hdim+h_idx*hdim+c_idx, val);
+}
+
+
+__global__ void attention_step2_with_rel_pos_value_backward_cuda_kernel( // M, h, hdim
+ int N, int M, int h, int hdim, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v, const float *table,
+ const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+
+ int c_idx = blockIdx.z;
+ int h_idx = blockIdx.y;
+ int thread_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ if (thread_idx >= M*3 || h_idx >= h || c_idx >= hdim) return;
+
+ int dim = thread_idx % 3;
+ int m_idx = thread_idx / 3;
+
+ int idx0 = index0[m_idx];
+ int idx1 = index1[m_idx];
+ int grad_out_idx = idx0*h*hdim+h_idx*hdim+c_idx;
+
+ int rel_idx_dim = rel_idx[thread_idx];
+ float table_val = table[rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim];
+ float grad_out_value = grad_out[grad_out_idx];
+
+ atomicAdd(grad_attn+m_idx*h+h_idx, grad_out_value * (v[idx1*h*hdim+h_idx*hdim+c_idx]/3 + table_val));
+ atomicAdd(grad_v+idx1*h*hdim+h_idx*hdim+c_idx, grad_out_value * attn[m_idx*h+h_idx]/3);
+ atomicAdd(grad_table+rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim, grad_out_value * attn[m_idx*h+h_idx]);
+}
+
+void attention_step2_with_rel_pos_value_forward_cuda_launcher(int N, int M, int h, int hdim, const float *attn, const float *v, const int *index0,
+ const int *index1, const float *table, const int *rel_idx, float *output) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_with_rel_pos_value_forward_cuda_kernel<<>>(N, M, h, hdim, attn, v, index0, index1, table, rel_idx, output);
+}
+
+void attention_step2_with_rel_pos_value_backward_cuda_launcher(int N, int M, int h, int hdim, const float *grad_out, const int *index0,
+ const int *index1, const float *attn, const float *v, const float *table, const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ dim3 threads(THREADS_PER_BLOCK);
+ attention_step2_with_rel_pos_value_backward_cuda_kernel<<>>(N, M, h, hdim, grad_out, index0, index1, attn, v, table, rel_idx, grad_attn, grad_v, grad_table);
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.h b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.h
new file mode 100644
index 0000000000000000000000000000000000000000..cafc7b69152fff9c0c440a093346fb6005923db0
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe/relative_pos_encoding_cuda_kernel.h
@@ -0,0 +1,26 @@
+#ifndef _RPE_CUDA_KERNEL
+#define _RPE_CUDA_KERNEL
+#include
+#include
+#include
+
+void dot_prod_with_idx_forward_cuda(int N, int M, int h, int hdim, at::Tensor q_tensor, at::Tensor index_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor);
+void dot_prod_with_idx_backward_cuda(int N, int M, int h, int hdim, at::Tensor grad_out_tensor, at::Tensor q_tensor, at::Tensor index_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor grad_q_tensor, at::Tensor grad_table_tensor);
+
+void attention_step2_with_rel_pos_value_forward_cuda(int N, int M, int h, int hdim, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor);
+void attention_step2_with_rel_pos_value_backward_cuda(int N, int M, int h, int hdim, at::Tensor grad_out_tensor, at::Tensor index0_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor, at::Tensor grad_table_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void dot_prod_with_idx_forward_cuda_launcher(int N, int M, int h, int hdim, const float *q, const int *index, const float *table, const int *rel_idx, float *output);
+void dot_prod_with_idx_backward_cuda_launcher(int N, int M, int h, int hdim, const float *grad_out, const float *q, const int *index, const float *table, const int *rel_idx, float *grad_q, float *grad_table);
+
+void attention_step2_with_rel_pos_value_forward_cuda_launcher(int N, int M, int h, int hdim, const float *attn, const float *v, const int *index0, const int *index1, const float *table, const int *rel_idx, float *output);
+void attention_step2_with_rel_pos_value_backward_cuda_launcher(int N, int M, int h, int hdim, const float *grad_out, const int *index0, const int *index1, const float *attn, const float *v, const float *table, const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.cu b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.cu
new file mode 100644
index 0000000000000000000000000000000000000000..628d8e3ab9679ac14fc89872595927c6f997198f
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.cu
@@ -0,0 +1,525 @@
+#include "../cuda_utils.h"
+#include "relative_pos_encoding_cuda_kernel_v2.h"
+
+
+// N, M, h, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, output
+
+template
+__global__ void dot_prod_with_idx_forward_cuda_kernel_v2( // M, h, hdim
+ int N, int M, int h, const float *q, const int *index_q, const float *k, const int *index_k,
+ const float *table_q, const float *table_k, const int *rel_idx, const int *rel_idx_offsets,
+ const int *sort_indices, float *output) {
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3), output: (M, h)
+
+ int h_idx = blockIdx.y;
+ int t_idx = blockIdx.x;
+ int n_idx = threadIdx.x;
+ int C = h*d;
+
+ __shared__ int start, end;
+ if(n_idx == 0){
+ start = rel_idx_offsets[t_idx];
+ end = rel_idx_offsets[t_idx+1];
+ // printf("e2: start: %d, end: %d\n", start, end);
+ }
+
+ __syncthreads();
+
+ int m_idx_prev = start + n_idx;
+ // if(m_idx_prev >= end)
+ // return;
+
+ __shared__ int m_idx;
+ if(n_idx == 0)
+ m_idx = sort_indices[m_idx_prev];
+
+ __syncthreads();
+
+ __shared__ int rel_idx_vec[3];
+ if(n_idx < 3)
+ rel_idx_vec[n_idx] = rel_idx[m_idx*3 + n_idx];
+
+ __syncthreads();
+
+ __shared__ float table_q_vec[d];
+ __shared__ float table_k_vec[d];
+
+ for(int i = n_idx; i < 2*d; i += blockDim.x){
+ if (i < d){
+ int ind0 = rel_idx_vec[0] * C * 3 + h_idx * d * 3 + i * 3 + 0;
+ int ind1 = rel_idx_vec[1] * C * 3 + h_idx * d * 3 + i * 3 + 1;
+ int ind2 = rel_idx_vec[2] * C * 3 + h_idx * d * 3 + i * 3 + 2;
+ table_q_vec[i] = table_q[ind0] + table_q[ind1] + table_q[ind2];
+ } else{
+ int ind0 = rel_idx_vec[0] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 0;
+ int ind1 = rel_idx_vec[1] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 1;
+ int ind2 = rel_idx_vec[2] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 2;
+ table_k_vec[i-d] = table_k[ind0] + table_k[ind1] + table_k[ind2];
+ }
+ }
+
+ __syncthreads();
+
+ for(int i = m_idx_prev; i < end; i += blockDim.x){
+ float sum = 0;
+ int m_idx_i = sort_indices[i];
+ int q_idx = index_q[m_idx_i];
+ int k_idx = index_k[m_idx_i];
+ for(int j = 0; j < d; j++){
+ sum += q[q_idx*C + h_idx*d + j] * table_q_vec[j];
+ sum += k[k_idx*C + h_idx*d + j] * table_k_vec[j];
+ }
+ output[m_idx_i*h + h_idx] = sum;
+ }
+}
+
+// N, M, h, hdim, grad_out, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, grad_q, grad_k, grad_table_q, grad_table_k
+
+template
+__global__ void dot_prod_with_idx_backward_cuda_kernel_v2( // M, h, hdim
+ int N, int M, int h, const float *grad_out, const float *q, const int *index_q,
+ const float *k, const int *index_k, const float *table_q, const float *table_k,
+ const int *rel_idx, const int *rel_idx_offsets, const int *sort_indices, float *grad_q,
+ float *grad_k, float *grad_table_q, float *grad_table_k) {
+
+ int h_idx = blockIdx.y;
+ int t_idx = blockIdx.x;
+ int n_idx = threadIdx.x;
+ int C = h*d;
+
+ __shared__ int start, end;
+ if(n_idx == 0){
+ start = rel_idx_offsets[t_idx];
+ end = rel_idx_offsets[t_idx+1];
+ }
+
+ __syncthreads();
+
+ int m_idx_prev = start + n_idx;
+ // if(m_idx_prev >= end)
+ // return;
+
+ __shared__ int m_idx;
+ if(n_idx == 0)
+ m_idx = sort_indices[m_idx_prev];
+
+ __syncthreads();
+
+ __shared__ int rel_idx_vec[3];
+ if(n_idx < 3)
+ rel_idx_vec[n_idx] = rel_idx[m_idx*3 + n_idx];
+
+ __syncthreads();
+
+ __shared__ float table_q_vec[d];
+ __shared__ float table_k_vec[d];
+
+ for(int i = n_idx; i < 2*d; i += blockDim.x){
+ if (i < d){
+ int ind0 = rel_idx_vec[0] * C * 3 + h_idx * d * 3 + i * 3 + 0;
+ int ind1 = rel_idx_vec[1] * C * 3 + h_idx * d * 3 + i * 3 + 1;
+ int ind2 = rel_idx_vec[2] * C * 3 + h_idx * d * 3 + i * 3 + 2;
+ table_q_vec[i] = table_q[ind0] + table_q[ind1] + table_q[ind2];
+ } else{
+ int ind0 = rel_idx_vec[0] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 0;
+ int ind1 = rel_idx_vec[1] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 1;
+ int ind2 = rel_idx_vec[2] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 2;
+ table_k_vec[i-d] = table_k[ind0] + table_k[ind1] + table_k[ind2];
+ }
+ }
+
+ __shared__ float gradient_q[d];
+ __shared__ float gradient_k[d];
+ for(int i = n_idx; i < d; i += blockDim.x){
+ gradient_q[i] = 0;
+ gradient_k[i] = 0;
+ }
+
+ __syncthreads();
+
+ for(int i = m_idx_prev; i < end; i += blockDim.x){
+ int m_idx_i = sort_indices[i];
+ int q_idx = index_q[m_idx_i];
+ int k_idx = index_k[m_idx_i];
+ float grad_out_i = grad_out[m_idx_i*h+h_idx];
+ for(int j = 0; j < d; j++){
+ atomicAdd(&gradient_q[j], q[q_idx*C + h_idx*d + j] * grad_out_i);
+ atomicAdd(&gradient_k[j], k[k_idx*C + h_idx*d + j] * grad_out_i);
+ atomicAdd(grad_q + q_idx*C + h_idx*d + j, table_q_vec[j] * grad_out_i);
+ atomicAdd(grad_k + k_idx*C + h_idx*d + j, table_k_vec[j] * grad_out_i);
+ }
+ }
+
+ __syncthreads();
+
+ for(int i = n_idx; i < d*2; i += blockDim.x){
+ if(i < d){
+ atomicAdd(grad_table_q + rel_idx_vec[0] * C * 3 + h_idx * d * 3 + i * 3, gradient_q[i]);
+ atomicAdd(grad_table_q + rel_idx_vec[1] * C * 3 + h_idx * d * 3 + i * 3 + 1, gradient_q[i]);
+ atomicAdd(grad_table_q + rel_idx_vec[2] * C * 3 + h_idx * d * 3 + i * 3 + 2, gradient_q[i]);
+ }else{
+ atomicAdd(grad_table_k + rel_idx_vec[0] * C * 3 + h_idx * d * 3 + (i-d) * 3, gradient_k[i-d]);
+ atomicAdd(grad_table_k + rel_idx_vec[1] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 1, gradient_k[i-d]);
+ atomicAdd(grad_table_k + rel_idx_vec[2] * C * 3 + h_idx * d * 3 + (i-d) * 3 + 2, gradient_k[i-d]);
+ }
+ }
+
+ // int c_idx = blockIdx.z;
+ // int h_idx = blockIdx.y;
+ // int thread_idx = blockIdx.x * blockDim.x + threadIdx.x;
+ // if (thread_idx >= M*3 || h_idx >= h || c_idx >= hdim) return;
+
+ // int dim = thread_idx % 3;
+ // int m_idx = thread_idx / 3;
+
+ // int q_idx = index[m_idx];
+ // int rel_idx_dim = rel_idx[thread_idx];
+ // int grad_out_idx = m_idx*h+h_idx;
+ // float grad_out_value = grad_out[grad_out_idx];
+
+ // float rel_table_val = table[rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim];
+ // atomicAdd(grad_q+q_idx*h*hdim+h_idx*hdim+c_idx, grad_out_value * rel_table_val);
+
+ // float q_value = q[q_idx*h*hdim+h_idx*hdim+c_idx];
+ // atomicAdd(grad_table+rel_idx_dim*h*hdim*3+h_idx*hdim*3+c_idx*3+dim, grad_out_value * q_value);
+}
+
+void dot_prod_with_idx_forward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, int T, const float *q,
+ const int *index_q, const float *k, const int *index_k, const float *table_q, const float *table_k,
+ const int *rel_idx, const int *rel_idx_offsets, const int *sort_indices, float *output)
+{
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(T, h);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+ n_threads = n_threads > 1024 ? 512 : n_threads;
+
+ // printf("e1: T: %d, h: %d, n_threads: %d\n", T, h, n_threads);
+
+ switch (hdim) {
+ case 16:
+ dot_prod_with_idx_forward_cuda_kernel_v2<16><<>>(N, M, h, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, output);
+ break;
+ case 32:
+ dot_prod_with_idx_forward_cuda_kernel_v2<32><<>>(N, M, h, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, output);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+void dot_prod_with_idx_backward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, int T,
+ const float *grad_out, const float *q, const int *index_q, const float *k, const int *index_k,
+ const float *table_q, const float *table_k, const int *rel_idx, const int *rel_idx_offsets, const int *sort_indices,
+ float *grad_q, float *grad_k, float *grad_table_q, float *grad_table_k)
+{
+ // input: grad_out: (M, h), output: grad_q: (N, h, hdim), grad_table: (L, h, hdim, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ // dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ dim3 blocks(T, h);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+ n_threads = n_threads > 1024 ? 512 : n_threads;
+
+ switch (hdim) {
+ case 16:
+ dot_prod_with_idx_backward_cuda_kernel_v2<16><<>>(N, M, h, grad_out, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, grad_q, grad_k, grad_table_q, grad_table_k);
+ break;
+ case 32:
+ dot_prod_with_idx_backward_cuda_kernel_v2<32><<>>(N, M, h, grad_out, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, grad_q, grad_k, grad_table_q, grad_table_k);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+
+
+template
+__global__ void dot_prod_with_idx_forward_cuda_kernel_v3( // M, h, hdim
+ int N, int M, int h, const float *q, const int *index_q_offsets, const float *k, const int *index_k,
+ const float *table_q, const float *table_k, const int *rel_idx, float *output) {
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3), output: (M, h)
+ int q_idx = blockIdx.x;
+ int h_idx = blockIdx.y;
+ int n_idx = threadIdx.x;
+ int C = h*d;
+
+ __shared__ float query_vec[d];
+ __shared__ int start, end;
+ if (n_idx == 0){
+ start = index_q_offsets[q_idx];
+ end = index_q_offsets[q_idx+1];
+ }
+ for(int i = n_idx; i < d; i += blockDim.x)
+ query_vec[i] = q[q_idx*C + h_idx*d + i];
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+ if(m_idx >= end)
+ return;
+
+ int k_idx = index_k[m_idx];
+ int r_idx1 = rel_idx[m_idx*3], r_idx2 = rel_idx[m_idx*3+1], r_idx3 = rel_idx[m_idx*3+2];
+ float sum = 0;
+ for(int i = 0; i < d; i++){
+ float table_q_scalar_i = table_q[r_idx1*C*3+h_idx*d*3+i*3] + table_q[r_idx2*C*3+h_idx*d*3+i*3+1] + table_q[r_idx3*C*3+h_idx*d*3+i*3+2];
+ sum += query_vec[i] * table_q_scalar_i;
+ float table_k_scalar_i = table_k[r_idx1*C*3+h_idx*d*3+i*3] + table_k[r_idx2*C*3+h_idx*d*3+i*3+1] + table_k[r_idx3*C*3+h_idx*d*3+i*3+2];
+ sum += k[k_idx*C+h_idx*d+i] * table_k_scalar_i;
+ }
+ output[m_idx*h + h_idx] = sum;
+
+}
+
+// N, M, h, hdim, grad_out, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, grad_q, grad_k, grad_table_q, grad_table_k
+
+template
+__global__ void dot_prod_with_idx_backward_cuda_kernel_v3( // M, h, hdim
+ int N, int M, int h, const float *grad_out, const float *q, const int *index_q_offsets,
+ const float *k, const int *index_k, const float *table_q, const float *table_k,
+ const int *rel_idx, float *grad_q, float *grad_k, float *grad_table_q, float *grad_table_k) {
+
+ int q_idx = blockIdx.x;
+ int h_idx = blockIdx.y;
+ int n_idx = threadIdx.x;
+ int C = h*d;
+
+ __shared__ float query_vec[d];
+ __shared__ int start, end;
+ if (n_idx == 0){
+ start = index_q_offsets[q_idx];
+ end = index_q_offsets[q_idx+1];
+ }
+ for(int i = n_idx; i < d; i += blockDim.x)
+ query_vec[i] = q[q_idx*C + h_idx*d + i];
+
+ __shared__ float gradients_q[d];
+ for(int i = n_idx; i < d; i += blockDim.x){
+ gradients_q[i] = 0;
+ }
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+
+ if(m_idx < end){
+ int k_idx = index_k[m_idx];
+ int r_idx1 = rel_idx[m_idx*3], r_idx2 = rel_idx[m_idx*3+1], r_idx3 = rel_idx[m_idx*3+2];
+ float gradient = grad_out[m_idx*h + h_idx];
+ for(int i = 0; i < d; i++){
+ float table_q_scalar_i = table_q[r_idx1*C*3+h_idx*d*3+i*3] + table_q[r_idx2*C*3+h_idx*d*3+i*3+1] + table_q[r_idx3*C*3+h_idx*d*3+i*3+2];
+ float table_k_scalar_i = table_k[r_idx1*C*3+h_idx*d*3+i*3] + table_k[r_idx2*C*3+h_idx*d*3+i*3+1] + table_k[r_idx3*C*3+h_idx*d*3+i*3+2];
+ float q_scalar_i = query_vec[i];
+ float k_scalar_i = k[k_idx*C+h_idx*d+i];
+ atomicAdd(&gradients_q[i], table_q_scalar_i * gradient);
+ atomicAdd(grad_k+k_idx*C+h_idx*d+i, table_k_scalar_i * gradient);
+ atomicAdd(grad_table_q+r_idx1*C*3+h_idx*d*3+i*3, q_scalar_i * gradient);
+ atomicAdd(grad_table_q+r_idx2*C*3+h_idx*d*3+i*3+1, q_scalar_i * gradient);
+ atomicAdd(grad_table_q+r_idx3*C*3+h_idx*d*3+i*3+2, q_scalar_i * gradient);
+ atomicAdd(grad_table_k+r_idx1*C*3+h_idx*d*3+i*3, k_scalar_i * gradient);
+ atomicAdd(grad_table_k+r_idx2*C*3+h_idx*d*3+i*3+1, k_scalar_i * gradient);
+ atomicAdd(grad_table_k+r_idx3*C*3+h_idx*d*3+i*3+2, k_scalar_i * gradient);
+ }
+ }
+ __syncthreads();
+
+ for(int i = n_idx; i < d; i += blockDim.x){
+ grad_q[q_idx*C+h_idx*d+i] = gradients_q[i];
+ }
+}
+
+void dot_prod_with_idx_forward_cuda_launcher_v3(int N, int M, int h, int hdim, int n_max, const float *q,
+ const int *index_q_offsets, const float *k, const int *index_k, const float *table_q, const float *table_k,
+ const int *rel_idx, float *output)
+{
+ // input: q: (N, h, hdim), index: (M), table: (L, h, hdim, 3), rel_idx: (M, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ dim3 blocks(N, h);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+
+ // printf("e1: h: %d, n_max: %d, n_threads: %d\n", h, n_max, n_threads);
+
+ switch (hdim) {
+ case 16:
+ dot_prod_with_idx_forward_cuda_kernel_v3<16><<>>(N, M, h, q, index_q_offsets, k, index_k, table_q, table_k, rel_idx, output);
+ break;
+ case 32:
+ dot_prod_with_idx_forward_cuda_kernel_v3<32><<>>(N, M, h, q, index_q_offsets, k, index_k, table_q, table_k, rel_idx, output);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+void dot_prod_with_idx_backward_cuda_launcher_v3(int N, int M, int h, int hdim, int n_max,
+ const float *grad_out, const float *q, const int *index_q_offsets, const float *k, const int *index_k,
+ const float *table_q, const float *table_k, const int *rel_idx,
+ float *grad_q, float *grad_k, float *grad_table_q, float *grad_table_k)
+{
+ // input: grad_out: (M, h), output: grad_q: (N, h, hdim), grad_table: (L, h, hdim, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ // dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ dim3 blocks(N, h);
+ // dim3 threads(THREADS_PER_BLOCK);
+
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+
+ switch (hdim) {
+ case 16:
+ dot_prod_with_idx_backward_cuda_kernel_v3<16><<>>(N, M, h, grad_out, q, index_q_offsets, k, index_k, table_q, table_k, rel_idx, grad_q, grad_k, grad_table_q, grad_table_k);
+ break;
+ case 32:
+ dot_prod_with_idx_backward_cuda_kernel_v3<32><<>>(N, M, h, grad_out, q, index_q_offsets, k, index_k, table_q, table_k, rel_idx, grad_q, grad_k, grad_table_q, grad_table_k);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+
+template
+__global__ void attention_step2_with_rel_pos_value_forward_cuda_kernel_v2( // M, h, hdim
+ int N, int M, int h, const float *attn, const float *v,
+ const int *index0_offsets, const int *index1, const float *table, const int *rel_idx, float *output) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+
+ int q_idx = blockIdx.x;
+ int h_idx = blockIdx.y;
+ int n_idx = threadIdx.x;
+
+ int C = h*d;
+
+ __shared__ int start, end;
+ __shared__ float result[d];
+
+ if (n_idx == 0){
+ start = index0_offsets[q_idx];
+ end = index0_offsets[q_idx+1];
+ }
+ for (int i = n_idx; i < d; i += blockDim.x){
+ result[i] = 0;
+ }
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+ if (m_idx < end){
+ float attn_scalar = attn[m_idx*h + h_idx];
+ int r_idx1 = rel_idx[m_idx*3], r_idx2 = rel_idx[m_idx*3+1], r_idx3 = rel_idx[m_idx*3+2];
+ for(int i = 0; i < d; i ++){
+ int v_idx = index1[m_idx];
+ float table_scaler_i = table[r_idx1*C*3+h_idx*d*3+i*3] + table[r_idx2*C*3+h_idx*d*3+i*3+1] + table[r_idx3*C*3+h_idx*d*3+i*3+2];
+ float value_scaler_i = v[v_idx*C + h_idx*d + i];
+ atomicAdd(&result[i], (table_scaler_i + value_scaler_i) * attn_scalar);
+ }
+ }
+
+ __syncthreads();
+
+ for (int i = n_idx; i < d; i += blockDim.x)
+ output[q_idx*C + h_idx*d + i] = result[i];
+}
+
+
+template
+__global__ void attention_step2_with_rel_pos_value_backward_cuda_kernel_v2( // M, h, hdim
+ int N, int M, int h, const float *grad_out, const int *index0_offsets, const int *index1, const float *attn, const float *v, const float *table,
+ const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+
+ int q_idx = blockIdx.x;
+ int h_idx = blockIdx.y;
+ int n_idx = threadIdx.x;
+
+ int C = h*d;
+
+ __shared__ int start, end;
+ __shared__ float gradients[d];
+
+ if (n_idx == 0){
+ start = index0_offsets[q_idx];
+ end = index0_offsets[q_idx+1];
+ }
+ for (int i = n_idx; i < d; i += blockDim.x){
+ gradients[i] = grad_out[q_idx*C + h_idx*d + i];
+ }
+
+ __syncthreads();
+
+ int m_idx = start + n_idx;
+ if (m_idx < end){
+ int v_idx = index1[m_idx];
+ int r_idx1 = rel_idx[m_idx*3], r_idx2 = rel_idx[m_idx*3+1], r_idx3 = rel_idx[m_idx*3+2];
+ float attn_scalar = attn[m_idx*h + h_idx];
+ float grad_attn_sum = 0;
+ for (int i = 0; i < d; i++){
+ float grad_out_scaler_i = gradients[i];
+ float table_scaler_i = table[r_idx1*C*3+h_idx*d*3+i*3] + table[r_idx2*C*3+h_idx*d*3+i*3+1] + table[r_idx3*C*3+h_idx*d*3+i*3+2];
+ float value_scaler_i = v[v_idx*C + h_idx*d + i];
+ grad_attn_sum += (table_scaler_i + value_scaler_i) * grad_out_scaler_i;
+ atomicAdd(grad_v + v_idx*C + h_idx*d + i, attn_scalar * grad_out_scaler_i);
+ atomicAdd(grad_table + r_idx1*C*3 + h_idx*d*3 + i*3, attn_scalar * grad_out_scaler_i);
+ atomicAdd(grad_table + r_idx2*C*3 + h_idx*d*3 + i*3 + 1, attn_scalar * grad_out_scaler_i);
+ atomicAdd(grad_table + r_idx3*C*3 + h_idx*d*3 + i*3 + 2, attn_scalar * grad_out_scaler_i);
+ }
+ grad_attn[m_idx*h + h_idx] = grad_attn_sum;
+ }
+}
+
+void attention_step2_with_rel_pos_value_forward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, const float *attn, const float *v, const int *index0_offsets,
+ const int *index1, const float *table, const int *rel_idx, float *output) {
+ // input: attn: (M, h), v: (N, h, hdim), index0: (M, ), index1: (M, ), table: (L, h, hdim, 3), rel_idx: (M, 3)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+ // dim3 blocks(DIVUP(M*3, THREADS_PER_BLOCK), h, hdim);
+ // dim3 threads(THREADS_PER_BLOCK);
+ dim3 blocks(N, h);
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+
+ switch (hdim) {
+ case 16:
+ attention_step2_with_rel_pos_value_forward_cuda_kernel_v2<16><<>>(N, M, h, attn, v, index0_offsets, index1, table, rel_idx, output);
+ break;
+ case 32:
+ attention_step2_with_rel_pos_value_forward_cuda_kernel_v2<32><<>>(N, M, h, attn, v, index0_offsets, index1, table, rel_idx, output);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
+
+void attention_step2_with_rel_pos_value_backward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, const float *grad_out, const int *index0_offsets,
+ const int *index1, const float *attn, const float *v, const float *table, const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table) {
+ // input: grad_output: (n, nsample, c), output: grad_input1: (n, c), grad_input2: (n, c)
+ //dim3 blocks(DIVUP(hdim, THREADS_PER_BLOCK), h, M);
+
+ dim3 blocks(N, h);
+ unsigned int n_threads = opt_n_threads(n_max);
+ n_threads = n_threads == n_max ? n_threads : n_threads * 2;
+
+ switch (hdim) {
+ case 16:
+ attention_step2_with_rel_pos_value_backward_cuda_kernel_v2<16><<>>(N, M, h, grad_out, index0_offsets, index1, attn, v, table, rel_idx, grad_attn, grad_v, grad_table);
+ break;
+ case 32:
+ attention_step2_with_rel_pos_value_backward_cuda_kernel_v2<32><<>>(N, M, h, grad_out, index0_offsets, index1, attn, v, table, rel_idx, grad_attn, grad_v, grad_table);
+ break;
+ default:
+ throw "d != 16 and d != 32";
+ }
+}
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.h b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.h
new file mode 100644
index 0000000000000000000000000000000000000000..648b152afe16d3011b62ff141a4e20b2a83579b4
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_kernel_v2.h
@@ -0,0 +1,32 @@
+#ifndef _RPE_V2_CUDA_KERNEL
+#define _RPE_V2_CUDA_KERNEL
+#include
+#include
+#include
+
+void dot_prod_with_idx_forward_cuda_v2(int N, int M, int h, int hdim, int n_max, int T, at::Tensor q_tensor, at::Tensor index_q_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor rel_idx_offsets_tensor, at::Tensor sort_indices_tensor, at::Tensor output_tensor);
+void dot_prod_with_idx_backward_cuda_v2(int N, int M, int h, int hdim, int n_max, int T, at::Tensor grad_out_tensor, at::Tensor q_tensor, at::Tensor index_q_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor rel_idx_offsets_tensor, at::Tensor sort_indices_tensor, at::Tensor grad_q_tensor, at::Tensor grad_k_tensor, at::Tensor grad_table_q_tensor, at::Tensor grad_table_k_tensor);
+
+void dot_prod_with_idx_forward_cuda_v3(int N, int M, int h, int hdim, int n_max, at::Tensor q_tensor, at::Tensor index_q_offsets_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor);
+void dot_prod_with_idx_backward_cuda_v3(int N, int M, int h, int hdim, int n_max, at::Tensor grad_out_tensor, at::Tensor q_tensor, at::Tensor index_q_offsets_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor grad_q_tensor, at::Tensor grad_k_tensor, at::Tensor grad_table_q_tensor, at::Tensor grad_table_k_tensor);
+
+void attention_step2_with_rel_pos_value_forward_cuda_v2(int N, int M, int h, int hdim, int n_max, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor index0_offsets_tensor, at::Tensor index1_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor);
+void attention_step2_with_rel_pos_value_backward_cuda_v2(int N, int M, int h, int hdim, int n_max, at::Tensor grad_out_tensor, at::Tensor index0_offsets_tensor, at::Tensor index1_tensor, at::Tensor attn_tensor, at::Tensor v_tensor, at::Tensor table_tensor, at::Tensor rel_idx_tensor, at::Tensor grad_attn_tensor, at::Tensor grad_v_tensor, at::Tensor grad_table_tensor);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void dot_prod_with_idx_forward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, int T, const float *q, const int *index_q, const float *k, const int *index_k, const float *table_q, const float *table_k, const int *rel_idx, const int *rel_idx_offsets, const int *sort_indices, float *output);
+void dot_prod_with_idx_backward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, int T, const float *grad_out, const float *q, const int *index_q, const float *k, const int *index_k, const float *table_q, const float *table_k, const int *rel_idx, const int *rel_idx_offsets, const int *sort_indices, float *grad_q, float *grad_k, float *grad_table_q, float *grad_table_k);
+
+void dot_prod_with_idx_forward_cuda_launcher_v3(int N, int M, int h, int hdim, int n_max, const float *q, const int *index_q_offsets, const float *k, const int *index_k, const float *table_q, const float *table_k, const int *rel_idx, float *output);
+void dot_prod_with_idx_backward_cuda_launcher_v3(int N, int M, int h, int hdim, int n_max, const float *grad_out, const float *q, const int *index_q_offsets, const float *k, const int *index_k, const float *table_q, const float *table_k, const int *rel_idx, float *grad_q, float *grad_k, float *grad_table_q, float *grad_table_k);
+
+void attention_step2_with_rel_pos_value_forward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, const float *attn, const float *v, const int *index0_offsets, const int *index1, const float *table, const int *rel_idx, float *output);
+void attention_step2_with_rel_pos_value_backward_cuda_launcher_v2(int N, int M, int h, int hdim, int n_max, const float *grad_out, const int *index0_offsets, const int *index1, const float *attn, const float *v, const float *table, const int *rel_idx, float *grad_attn, float *grad_v, float *grad_table);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
diff --git a/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_v2.cpp b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_v2.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0a4c96a8688536d19611a57a2017ae1ba44f12bf
--- /dev/null
+++ b/models/Mask3D/mask3d/utils/pointops2/src/rpe_v2/relative_pos_encoding_cuda_v2.cpp
@@ -0,0 +1,111 @@
+#include
+#include
+#include
+#include
+#include "relative_pos_encoding_cuda_kernel_v2.h"
+
+void dot_prod_with_idx_forward_cuda_v2(int N, int M, int h, int hdim, int n_max, int T, at::Tensor q_tensor,
+ at::Tensor index_q_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor,
+ at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor rel_idx_offsets_tensor, at::Tensor sort_indices_tensor, at::Tensor output_tensor)
+{
+ const float *q = q_tensor.data_ptr();
+ const int *index_q = index_q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index_k = index_k_tensor.data_ptr();
+ const float *table_q = table_q_tensor.data_ptr();
+ const float *table_k = table_k_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ const int *rel_idx_offsets = rel_idx_offsets_tensor.data_ptr();
+ const int *sort_indices = sort_indices_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ dot_prod_with_idx_forward_cuda_launcher_v2(N, M, h, hdim, n_max, T, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, output);
+}
+
+void dot_prod_with_idx_backward_cuda_v2(int N, int M, int h, int hdim, int n_max, int T, at::Tensor grad_out_tensor,
+ at::Tensor q_tensor, at::Tensor index_q_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor,
+ at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor rel_idx_offsets_tensor,
+ at::Tensor sort_indices_tensor, at::Tensor grad_q_tensor, at::Tensor grad_k_tensor, at::Tensor grad_table_q_tensor, at::Tensor grad_table_k_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const float *q = q_tensor.data_ptr();
+ const int *index_q = index_q_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index_k = index_k_tensor.data_ptr();
+ const float *table_q = table_q_tensor.data_ptr();
+ const float *table_k = table_k_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ const int *rel_idx_offsets = rel_idx_offsets_tensor.data_ptr();
+ const int *sort_indices = sort_indices_tensor.data_ptr();
+ float *grad_q = grad_q_tensor.data_ptr();
+ float *grad_k = grad_k_tensor.data_ptr();
+ float *grad_table_q = grad_table_q_tensor.data_ptr();
+ float *grad_table_k = grad_table_k_tensor.data_ptr();
+ dot_prod_with_idx_backward_cuda_launcher_v2(N, M, h, hdim, n_max, T, grad_out, q, index_q, k, index_k, table_q, table_k, rel_idx, rel_idx_offsets, sort_indices, grad_q, grad_k, grad_table_q, grad_table_k);
+}
+
+
+void dot_prod_with_idx_forward_cuda_v3(int N, int M, int h, int hdim, int n_max, at::Tensor q_tensor,
+ at::Tensor index_q_offsets_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor, at::Tensor table_q_tensor,
+ at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor output_tensor)
+{
+ const float *q = q_tensor.data_ptr();
+ const int *index_q_offsets = index_q_offsets_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index_k = index_k_tensor.data_ptr();
+ const float *table_q = table_q_tensor.data_ptr();
+ const float *table_k = table_k_tensor.data_ptr();
+ const int *rel_idx = rel_idx_tensor.data_ptr();
+ float *output = output_tensor.data_ptr();
+ dot_prod_with_idx_forward_cuda_launcher_v3(N, M, h, hdim, n_max, q, index_q_offsets, k, index_k, table_q, table_k, rel_idx, output);
+}
+
+void dot_prod_with_idx_backward_cuda_v3(int N, int M, int h, int hdim, int n_max, at::Tensor grad_out_tensor,
+ at::Tensor q_tensor, at::Tensor index_q_offsets_tensor, at::Tensor k_tensor, at::Tensor index_k_tensor,
+ at::Tensor table_q_tensor, at::Tensor table_k_tensor, at::Tensor rel_idx_tensor, at::Tensor grad_q_tensor,
+ at::Tensor grad_k_tensor, at::Tensor grad_table_q_tensor, at::Tensor grad_table_k_tensor)
+{
+ const float *grad_out = grad_out_tensor.data_ptr();
+ const float *q = q_tensor.data_ptr();
+ const int *index_q_offsets = index_q_offsets_tensor.data_ptr();
+ const float *k = k_tensor.data_ptr();
+ const int *index_k = index_k_tensor.data_ptr();
+ const float *table_q = table_q_tensor.data_ptr();
+ const float *table_k = table_k_tensor.data_ptr