prdev's picture
Update README.md
2fa4113 verified
|
raw
history blame
2.15 kB
---
license: apache-2.0
---
# EDL(VGG) - Pakistan Flood Detection
This model was created to help detect flooding and classify areas as flooded or not flooded given aerial photos of a given location. First responders need to act quickly in a crisis, but don’t have the time to collect large datasets of damage at a location. They also would likely want to use a non-technical interface to quickly create and use these models. Our solution Easy Deep Learning allows first responders and many others with use cases like them to rapidly create and use their deep learning models.
## How it works
Sourcing a large enough dataset for training can often be very time consuming and in critical use cases, impossible. Often, this process requires the oversight of data scientists and machine learning talent which is expensive. Stable diffusion has a variety of knowledge of how different objects and situations should look like, however without proper guidance it can hallucinate and produce images that don’t represent a user’s intended classes. We found that using an image-to-image model with a prompt that informs the model of the intended classification allows it to generate additional training data like the actual training data that classifiers can use. After generating this additional training data, users can use advanced image classification algorithms like ResNet and Vision Transformers to classify the image with better accuracy.
## Training Process
This classifier was trained on our platform with only 6 real examples of non-flooded areas and 6 examples of flooded areas. From there we automatically generated 200 additional data points for our model in about 5 minutes using ipex-optimized code on an xpu compute device. Without additional training data, the vision transformer suffered catastrophically from overfitting, even after tuning its hyperparameters. It simply chose to mark all regions as either being flooded or not flooded across 5 runs. Augmenting the data synthetically allowed us to go from random guessing to 75%-100% accuracy across 8 test points across 5 runs with random test points selected for each run.