File size: 3,962 Bytes
a97c94d d1344b2 a97c94d d1344b2 a97c94d d1344b2 a97c94d d1344b2 e4c8282 ea2f5e6 e4c8282 2459659 904c408 98ff808 904c408 98ff808 904c408 98ff808 ded5c80 904c408 e4c8282 904c408 e4c8282 98ff808 e4c8282 904c408 98ff808 e4c8282 904c408 e4c8282 98ff808 e4c8282 98ff808 07a8b07 98ff808 e4c8282 bc1b969 e4c8282 904c408 e4c8282 973e481 e4c8282 ded5c80 e4c8282 ea2f5e6 e4c8282 98a2b95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
license: apache-2.0
datasets:
- 0jl/NYUv2
language:
- en
metrics:
- r2
- mae
- mse
pipeline_tag: depth-estimation
tags:
- xgboost
- python
- depth-estimation
- resnet50
---
# Depth Estimation Using ResNet50 and XGBoost
## Author
- **Vishal Adithya.A**
## Overview
This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the **NYUv2 dataset** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse.
### Loading the Model
The model is saved as `model.pkl` using `pickle`. You can load and use it as follows:
```python
with open("model.pkl", "rb") as f:
model = pickle.load(f)
features = extract_features("path/to/image.jpg")
predicted_depth = model.predict([features])
print(predicted_depth[0])
```
**NOTE:** extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image.
## Key Features
- **Model Architecture**:
- Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling).
- Regression: XGBoost, optimized for structured data prediction.
- **Training GPU**: NVIDIA RTX 4060 Ti, ensuring efficient computation.
- **Target**: Predict the average depth of images based on the depth maps from the dataset.
## Dataset
- Dataset: **NYUv2** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2))
- Format: The dataset includes RGB images and corresponding depth maps.
- Preprocessing:
- Images were resized to 224x224 pixels to match the input requirements of ResNet50.
- Depth maps were converted into single average depth values.
## Model Training
1. **Feature Extraction**:
- ResNet50 was used to extract a fixed-length feature vector from each image.
- Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module.
2. **Regression**:
- XGBoost regressor was trained on the extracted features to predict average depth values.
- Hyperparameters were tuned using cross-validation techniques for optimal performance.
## Results
- **R² Score**: 0.841
- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods.
## How to Use
### Requirements
1. Python 3.10+
2. Required libraries:
- `numpy`
- `pickle`
- `xgboost`
- `datasets`
- `tensorflow`
- `scikitlearn`
Install the dependencies using pip:
```bash
pip install numpy tensorflow xgboost datasets scikit-learn
```
### Training Pipeline
If you want to retrain the model, follow these steps:
1. Download the **NYUv2 dataset** from Hugging Face:
```python
from datasets import load_dataset
dataset = load_dataset("0jl/NYUv2")
```
2. Extract features using ResNet50:
```python
model = ResNet50(weights="imagenet", include_top=False, pooling="avg")
from PIL import Image
def extract_features(image_path):
image_array = preprocess_input(image_array)
features = model.predict(image_array)
return features.flatten()
```
3. Train the XGBoost regressor on the extracted features and save the model:
```python
regressor = XGBRegressor()
regressor.fit(X_train, y_train)
with open("model.pkl", "wb") as f:
pickle.dump(regressor, f)
```
**NOTE:** This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model.
## License
This project is licensed under the Apache License 2.0.
## Acknowledgments
- Hugging Face for hosting the NYUv2 dataset.
- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration.
- TensorFlow and XGBoost for robust machine learning frameworks. |