1. Introduction
I2D-LocX is an efficient, precise, and robust camera localization method for LiDAR maps. Given a camera image and a LiDAR map, the method estimates dense image-to-depth flow and recovers the relative 6-DoF transformation through PnP. This repository provides the inference code, the updated CUDA visibility extension, a pretrained KITTI checkpoint, and a small sample dataset for quick reproduction.
This project is based on I2D-Loc. The visibility package has been updated for I2D-LocX, so please use the implementation included in this repository instead of the original I2D-Loc visibility package.
2. News
- 2025.6: I2D-LocX was published in IEEE Robotics and Automation Letters (RA-L).
- 2025.9: I2D-LocX was transferred to ICRA 2026 for presentation.
3. Requirements and Installation
3.1 Tested Environment
The project was developed and evaluated on Ubuntu 20.04 with Python 3.11, PyTorch 2.7.0, and CUDA 11.8. Inference requires an NVIDIA GPU, a working CUDA toolkit, and a C++ compiler compatible with the installed PyTorch build.
3.2 Create the Environment
conda create -n i2d-locx python=3.11 -y
conda activate i2d-locx
3.3 Install PyTorch
Install the PyTorch build that matches your CUDA environment. The following command installs the CUDA 11.8 build:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
For other CUDA versions, refer to the official PyTorch installation guide.
3.4 Install Dependencies
pip install -r requirements.txt
3.5 Build the Visibility Package
cd pkg/visibility_package
python setup.py install
cd ../..
Verify the installation:
python -c "import visibility; print('visibility extension is available')"
4. Pretrained Checkpoint
The sample configuration uses the KITTI checkpoint kitti_100epoch.pth. Download it from Hugging Face, then place it in the following location:
checkpoints/kitti_100epoch.pth
The expected directory structure is:
i2d-locX-open/
βββ checkpoints/
βββ kitti_100epoch.pth
5. Data Preparation
5.1 Original Datasets
The full training and evaluation datasets follow the preparation procedure of the original I2D-Loc repository. I2D-Loc uses the KITTI Odometry Dataset and aggregates LiDAR scans at their ground-truth poses to construct complete maps. The maps are downsampled at a resolution of 0.1 m and stored as HDF5 files. Please refer to the original repository for the full KITTI preprocessing scripts and directory layout.
5.2 Sample Dataset
This repository includes a small sample dataset containing four image and LiDAR-map pairs from KITTI odometry sequence 00, allowing the inference pipeline to be tested without preparing the complete dataset.
sample/
βββ 0/
βββ image/
β βββ 000000.png
β βββ 000100.png
β βββ 000200.png
β βββ 000300.png
βββ lidar/
βββ 000000.h5
βββ 000100.h5
βββ 000200.h5
βββ 000300.h5
Images and LiDAR maps are paired by filename. The sample camera intrinsics are defined in core/dataset.py.
6. Quick Start
Run the sample from the repository root:
bash cmd/sample.sh
The script executes:
python sample.py --cfg cfg/sample.toml --checkpoint checkpoints/kitti_100epoch.pth
The default configuration uses GPU 0, generates deterministic initial pose perturbations with seed 3407, and evaluates all four sample pairs.
7. Results
Each run creates a timestamped output directory:
i2d_locX_sample/test/test_<YYYYMMDD_HHMMSS>/
βββ logs/
β βββ test.log
βββ result/
βββ iter_1/
βββ iter_2/
βββ iter_3/
βββ iter_4/
Each iteration directory contains vision_image_with_initial.png for the initial LiDAR projection, vision_image_with_initial_gt.png for the ground-truth alignment, and vision_image_with_predict.png for the alignment after pose correction. The terminal and test.log report the initial and predicted rotation and translation errors.
8. Configuration
The sample configuration is located at cfg/sample.toml.
| Option | Description |
|---|---|
gpus |
CUDA device list; the sample uses [0] |
dataset.root_folder |
Root directory of the input data |
dataset.test_sequence |
Sequence used for evaluation |
dataset.max_r |
Maximum sampled rotation perturbation in degrees |
dataset.max_t |
Maximum sampled translation perturbation |
dataset.batch_size |
Evaluation batch size |
model.iters |
Number of iterative flow-refinement steps |
9. Troubleshooting
No CUDA devices available: Runnvidia-smiandpython -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"to verify the NVIDIA driver and PyTorch CUDA build.ModuleNotFoundError: No module named 'visibility': Rebuild the extension withcd pkg/visibility_package && python setup.py install.Checkpoint not found: Place the model atcheckpoints/kitti_100epoch.pthor pass its actual location through--checkpoint.- CUDA out of memory: Keep
dataset.batch_size = 1and close other GPU workloads.
10. Citation
If you find this project useful, please cite our IEEE RA-L paper:
@article{yu2025i2dlocx,
title={I2D-LocX: An Efficient, Precise and Robust Method for Camera Localization in LiDAR Maps},
author={Yu, Huai and Zhu, Xubo and Han, Shu and Yang, Wen and Xia, Gui-Song},
journal={IEEE Robotics and Automation Letters},
volume={10},
number={8},
pages={7899--7906},
year={2025},
doi={10.1109/LRA.2025.3581122}
}
11. Acknowledgments
This repository is developed from I2D-Loc. The original I2D-Loc implementation builds upon CMRNet, RAFT, and BPnP. I2D-LocX also benefits from SEA-RAFT. We thank the authors for making their work publicly available.