Model description
Autoencoder model trained to compress information from sentinel-2 satellite images using Vision Transformer (ViT) as encoder backbone to extract features. The latent space of the model is given by 1024 neurons which can be used to generate embeddings from the sentinel-2 satellite images.
The model was trained using bands 1-12 of the Sentinel-2 satellites and using the top 10 municipalities of Colombia with most dengue cases.
The input shape of the model is 224, 224, 12. To extract features you should remove the last layer.
The model can be read as (example in jupyer):
!git lfs install
!git clone https://huggingface.co/MITCriticalData/Sentinel-2_ViT_Autoencoder_12Bands
import tensorflow as tf
from transformers import TFViTModel
model = tf.keras.models.load_model('Sentinel-2_ViT_Autoencoder_12Bands', custom_objects={"TFViTModel": TFViTModel})
You can extract the embeddings removing the last layer using:
import tensorflow as tf
backbone = tf.keras.Sequential()
for layer in model.layers[:-1]: # just exclude last layer from copying
backbone.add(layer)
Intended uses & limitations
The model was trained with images of 10 different cities in Colombia, however it may require fine tuning or retraining to learn from other contexts such as countries and other continents.
Training and evaluation data
The model was trained with satellite images of 10 different cities in Colombia extracted from sentinel-2 using 12 bands using an asymmetric autoencoder. Images with information that could result in noise such as black images were filtered prior to training to avoid noise in the data..
The dataset was split into train and test using 80% for train and 20% to test.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameters | Value |
---|---|
name | Adam |
learning_rate | 0.0010000000474974513 |
decay | 0.0 |
beta_1 | 0.8999999761581421 |
beta_2 | 0.9990000128746033 |
epsilon | 1e-07 |
amsgrad | False |
training_precision | float32 |
- Downloads last month
- 0