File size: 3,962 Bytes
a97c94d
 
 
d1344b2
a97c94d
d1344b2
a97c94d
d1344b2
 
 
 
a97c94d
d1344b2
 
 
 
e4c8282
 
 
ea2f5e6
 
e4c8282
2459659
904c408
98ff808
 
 
 
 
 
 
904c408
98ff808
904c408
98ff808
ded5c80
904c408
 
e4c8282
 
 
 
 
 
 
 
 
 
 
904c408
e4c8282
 
 
 
 
 
 
98ff808
e4c8282
 
904c408
98ff808
e4c8282
 
 
904c408
e4c8282
 
 
98ff808
e4c8282
98ff808
07a8b07
98ff808
e4c8282
 
bc1b969
e4c8282
 
 
 
904c408
e4c8282
 
 
 
 
 
 
 
 
 
973e481
e4c8282
 
 
 
 
 
 
 
 
 
 
 
 
 
ded5c80
e4c8282
ea2f5e6
 
 
 
e4c8282
 
 
98a2b95
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: apache-2.0
datasets:
  - 0jl/NYUv2
language:
  - en
metrics:
  - r2
  - mae
  - mse
pipeline_tag: depth-estimation
tags:
  - xgboost
  - python
  - depth-estimation
  - resnet50
---

# Depth Estimation Using ResNet50 and XGBoost
## Author
 - **Vishal Adithya.A**
## Overview
This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the **NYUv2 dataset** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse.

### Loading the Model
The model is saved as `model.pkl` using `pickle`. You can load and use it as follows:

```python
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

features = extract_features("path/to/image.jpg") 
predicted_depth = model.predict([features])
print(predicted_depth[0])
```
**NOTE:** extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image.

## Key Features
- **Model Architecture**:
  - Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling).
  - Regression: XGBoost, optimized for structured data prediction.
- **Training GPU**: NVIDIA RTX 4060 Ti, ensuring efficient computation.
- **Target**: Predict the average depth of images based on the depth maps from the dataset.

## Dataset
- Dataset: **NYUv2** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2))
- Format: The dataset includes RGB images and corresponding depth maps.
- Preprocessing:
  - Images were resized to 224x224 pixels to match the input requirements of ResNet50.
  - Depth maps were converted into single average depth values.

## Model Training
1. **Feature Extraction**:
   - ResNet50 was used to extract a fixed-length feature vector from each image.
   - Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module.
2. **Regression**:
   - XGBoost regressor was trained on the extracted features to predict average depth values.
   - Hyperparameters were tuned using cross-validation techniques for optimal performance.

## Results
- **R² Score**: 0.841
- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods.

## How to Use
### Requirements
1. Python 3.10+
2. Required libraries:
   - `numpy`
   - `pickle`
   - `xgboost`
   - `datasets`
   - `tensorflow`
   - `scikitlearn`

Install the dependencies using pip:
```bash
pip install numpy tensorflow xgboost datasets scikit-learn
```

### Training Pipeline
If you want to retrain the model, follow these steps:
 
1. Download the **NYUv2 dataset** from Hugging Face:
   ```python
   from datasets import load_dataset
   dataset = load_dataset("0jl/NYUv2")
   ```
2. Extract features using ResNet50:
   ```python

   model = ResNet50(weights="imagenet", include_top=False, pooling="avg")

   from PIL import Image
   def extract_features(image_path):
       image_array = preprocess_input(image_array)
       features = model.predict(image_array)
       return features.flatten()
   ```
3. Train the XGBoost regressor on the extracted features and save the model:
   ```python

   regressor = XGBRegressor()
   regressor.fit(X_train, y_train)

   with open("model.pkl", "wb") as f:
       pickle.dump(regressor, f)
   ```
**NOTE:** This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model.


## License
This project is licensed under the Apache License 2.0.

## Acknowledgments
- Hugging Face for hosting the NYUv2 dataset.
- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration.
- TensorFlow and XGBoost for robust machine learning frameworks.