PMDEVS commited on
Commit
ca1596c
·
verified ·
1 Parent(s): 491d219

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -0
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - PMDEVS/explorers_emit_model
7
+ pipeline_tag: tabular-classification
8
+ ---
9
+ Here’s the updated model card incorporating the requested changes with CatBoost replacing XGBoost:
10
+
11
+ ---
12
+
13
+ ## EMIT Model - Environmental Monitoring and Intelligence Tool
14
+
15
+ ### Title
16
+ **EMIT Model** - Environmental Monitoring and Intelligence Tool (CatBoost Classifier)
17
+
18
+ ---
19
+
20
+ ### Overview
21
+ The **EMIT Model** (Environmental Monitoring and Intelligence Tool) is an advanced **CatBoost Classifier** designed to predict potential mining areas by analyzing environmental data. This tool is a part of the **EMiTAL** (Environmental Monitoring and Intelligence Tool Algorithm) framework and leverages **Remote Sensing**, **RayCasting**, and **Polygon Gridding** techniques to provide high-precision identification of viable mining zones.
22
+
23
+ #### Goal
24
+ To support decision-making in mining by providing a robust predictive model that identifies areas with high mining potential based on environmental characteristics. This model benefits regulatory bodies, mining companies, and environmental agencies aiming to balance resource extraction with sustainability.
25
+
26
+ ---
27
+
28
+ ### Framework: EMiTAL
29
+ The **EMiTAL framework** integrates several innovative approaches to enhance prediction accuracy:
30
+ - **Remote Sensing**: Captures large-scale environmental data (e.g., vegetation, soil, and air quality).
31
+ - **RayCasting and Polygon Gridding**: Segments geographic regions into grids, enabling precise targeting.
32
+ - **Environmental Indicators**:
33
+ - **NDVI (Normalized Difference Vegetation Index)**: Measures vegetation health.
34
+ - **NDWI (Normalized Difference Water Index)**: Evaluates water content.
35
+ - **NDTI (Normalized Difference Tillage Index)**: Assesses soil disturbance.
36
+ - **Land Elevation**: Provides terrain insights.
37
+ - **Air Quality Metrics**: NO2, PM10, and CO to gauge environmental impact.
38
+
39
+ ---
40
+
41
+ ### Model Pipeline
42
+ The model pipeline is built to preprocess and optimize environmental data for classification. Using CatBoost’s native handling of categorical data, the pipeline minimizes preprocessing complexity while ensuring high performance.
43
+
44
+ - **Model Type**: CatBoost Classifier
45
+ - **Objective**: Binary classification to predict if a region is suitable for mining (`True` for viable, `False` for non-viable).
46
+ - **Cross-Validation Results**:
47
+ - Mean Accuracy: **78.32%**
48
+ - Standard Deviation: **4.25%**
49
+ - **Final Accuracy on Test Data**: **90.32%**
50
+
51
+ ---
52
+
53
+ ### Dataset and Features
54
+ #### Input Features:
55
+ - **Latitude** and **Longitude**: Geospatial coordinates.
56
+ - **NDVI, NDWI, NDTI**: Environmental indices critical for mining predictions.
57
+ - **Land Elevation**: Topographic information.
58
+ - **Vegetation Index**: Encoded categories (Null, Sparse, Moderate, Healthy).
59
+ - **Air Quality Metrics**: NO2, PM10, and CO levels.
60
+
61
+ #### Initial Dataset:
62
+ - **Total Records**: 152
63
+ - **Data Types**: Numerical, categorical, and boolean.
64
+ - **Categorical Features**: Vegetation Index, handled natively by CatBoost.
65
+
66
+ ---
67
+
68
+ ### Model Performance
69
+ #### Key Metrics:
70
+ - **Accuracy**: **90.32%**
71
+ - **Precision, Recall, F1-Score**:
72
+ | **Class** | **Precision** | **Recall** | **F1-Score** | **Support** |
73
+ |------------|---------------|------------|--------------|-------------|
74
+ | **False** | 0.86 | 0.75 | 0.80 | 8 |
75
+ | **True** | 0.92 | 0.96 | 0.94 | 23 |
76
+
77
+ - **Overall Accuracy**: **90%**
78
+ - **Macro Average**: Precision = 0.89, Recall = 0.85, F1-Score = 0.87
79
+ - **Weighted Average**: Precision = 0.90, Recall = 0.90, F1-Score = 0.90
80
+
81
+ #### Confusion Matrix:
82
+ | | Predicted False | Predicted True |
83
+ |---------------|-----------------|----------------|
84
+ | **Actual False** | 6 | 2 |
85
+ | **Actual True** | 1 | 22 |
86
+
87
+ ---
88
+
89
+ ### Feature Importance
90
+ The model identified the following features as most influential:
91
+ | **Feature** | **Importance (%)** |
92
+ |-------------------------------|--------------------|
93
+ | Longitude | 40.50 |
94
+ | NO2 | 25.81 |
95
+ | Latitude | 19.43 |
96
+ | NDWI | 4.85 |
97
+ | NDVI | 4.60 |
98
+ | NDTI | 4.41 |
99
+ | Vegetation Index (Encoded) | 0.30 |
100
+ | Land Elevation | 0.10 |
101
+ | PM10 | 0.00 |
102
+ | CO | 0.00 |
103
+
104
+ ---
105
+
106
+ ### Usage Instructions
107
+ To use this model:
108
+ 1. Prepare your dataset with the specified input features.
109
+ 2. Ensure feature names match the training dataset.
110
+ 3. Run predictions using the following script:
111
+
112
+ ```python
113
+ import joblib
114
+ import pandas as pd
115
+
116
+ # Load the model
117
+ model = joblib.load("emit_model_catboost.joblib")
118
+
119
+ # Load and preprocess your data
120
+ data = pd.read_csv("path/to/your/data.csv")
121
+ predictions = model.predict(data)
122
+ ```
123
+
124
+ ---
125
+
126
+ ### Authors
127
+ - Joseph Ackon
128
+ - Felix Kudjo Mlagada
129
+ - Aristotle Mbroh
130
+ - Prince Mawuko Dzorkpe
131
+ - Manford Ehuntem
132
+
133
+ **Acknowledgments**:
134
+ Thanks to **Takoradi Technical University**, **Data Hackathon Ghana Statistical Service (2024)**, and **StatsBank** for their support.
135
+
136
+ ---
137
+
138
+ This version of the EMIT model is optimized with CatBoost for better performance on mixed-type datasets. Let me know if further updates are needed!